In [None]:
File Name: Sign_Language_Translation_Analysis.ipynb
Description: This file contains processed data from sign language videos.Each data point represents a frame and formatted to 
ensure consistent dimensions across the dataset.

Email:vathsalakashyap125@gmail.com
Date: 2024-07-28

Project Name: Sign language translation

overview:
The json files originally contains 586 frames.This data is reshaped from (batch_size, 183, 586) to (batch_size, 183, 1024) 
by padding with zeros and output is saved in the updated_data.csv file.

Dependencies:
- NumPy (used for handling numerical data and performing mathematical operations)
- Pandas (used for handling structured data and exporting it to CSV.)
- glob (A module for finding files matching a specified pattern.)
- os:(A module for interacting with the operating system, especially for file path manipulation.)

Revision History:
- 2024-07-24: Project initiation and initial setup.
- 2024-07-25: Implemented data extraction from JSON files.
- 2024-07-26: Developed and tested reshaping and padding with zeros.
- 2024-07-27: Bug fixes and handling the errors.
- 2024-07-28: Completed data export to CSV and documentation updated.

Usage Instructions:
- Run this file using Python 3.x.
- Ensure all dependencies are installed.


In [10]:
data['version']

1.3

In [11]:
data["people"]

[{'person_id': [-1],
  'hand_pose_face': [0.56953046875,
   0.69633,
   0.443304,
   0.5492343749999999,
   0.67752875,
   0.401323,
   0.51985859375,
   0.66471125,
   0.364914,
   0.50223359375,
   0.67325625,
   0.214353,
   0.48621015625,
   0.68265625,
   0.167932,
   0.51451796875,
   0.6621475,
   0.818723,
   0.48834609375,
   0.6698375,
   0.855601,
   0.47392578125,
   0.6749649999999999,
   0.832159,
   0.46110703124999997,
   0.67752875,
   0.843691,
   0.5150515625,
   0.68265625,
   0.831599,
   0.48834609375,
   0.695475,
   0.831266,
   0.47606171875000003,
   0.7108575,
   0.845845,
   0.46805078125,
   0.72367625,
   0.92117,
   0.51985859375,
   0.70316625,
   0.802339,
   0.49368749999999995,
   0.71683875,
   0.893307,
   0.48353984375000003,
   0.72880375,
   0.834172,
   0.47659609375,
   0.73734875,
   0.807817,
   0.5252,
   0.725385,
   0.756127,
   0.5065062499999999,
   0.7399125,
   0.910313,
   0.49796015625,
   0.74760375,
   0.851193,
   0.49208515625000

In [36]:
import json
import numpy as np #importing necessary libraries

with open('CFRh_KOT9nY_4-5-rgb_front.json', 'r') as f:
    data = json.load(f)  #loading the json file

print(f"Type of data: {type(data)}") #checking the type of the dataset

if isinstance(data, dict):
    print(f"Keys in data: {list(data.keys())}")  #printing the keys in the data

Type of data: <class 'dict'>
Keys in data: ['version', 'people']


In [12]:
import numpy as np
import json
import os
from glob import glob

def find_files(directory, pattern='**/*.json'):
    return glob(os.path.join(directory,pattern),recursive=True)
    
json_files = find_files('J:\Folders\Downloads\output-20240727T092221Z-001')  

number_samples = len(json_files)
frame_length = 183 #frame length in the original data
time_length = 1024 #target shape

batch_size = number_samples

padded_data = np.zeros((batch_size, frame_length, time_length)) #padding zeros to get the targetted shape

# Fill the padded array with the actual data
for i, frames in enumerate(json_files):
    for j, frame in enumerate(frames):
        if j >= frame_length:
            break
        try:
            # Convert frame data to float, ensure it is numeric
            frame_data = [float(value) for value in frame]
            num_values = min(len(frame_data), time_length)
            padded_data[i, j, :num_values] = frame_data[:num_values]
        except ValueError as e:
            continue

# Now padded_data has the shape (batch_size, 183, 1024)
print(f"After reshaping the dataset with zeros the current shape is: {padded_data.shape}")


After reshaping the dataset with zeros the current shape is: (5, 183, 1024)


In [15]:
pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Downloading pandas-2.2.2-cp311-cp311-win_amd64.whl (11.6 MB)
     ---------------------------------------- 11.6/11.6 MB 3.1 MB/s eta 0:00:00
Collecting pytz>=2020.1
  Downloading pytz-2024.1-py2.py3-none-any.whl (505 kB)
     -------------------------------------- 505.5/505.5 kB 2.9 MB/s eta 0:00:00
Collecting tzdata>=2022.7
  Downloading tzdata-2024.1-py2.py3-none-any.whl (345 kB)
     -------------------------------------- 345.4/345.4 kB 3.1 MB/s eta 0:00:00
Installing collected packages: pytz, tzdata, pandas
Successfully installed pandas-2.2.2 pytz-2024.1 tzdata-2024.1
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 24.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [16]:
import pandas as pd

In [28]:
flattened_data = padded_data.reshape((batch_size, frame_length*time_length))

# Convert the array to a DataFrame and save it to a CSV file
df = pd.DataFrame(flattened_data)
df.to_csv('updated_data.csv', index=False)

In [35]:
df = pd.read_csv('updated_data.csv')

# Print the DataFrame shape 
print("DataFrame shape:", df.shape)

DataFrame shape: (5, 187392)
