## Data Collection and Processing

In this part, a function is defined to read data from a file. Imagine you have a bunch of papers with numbers on them, and you want to organize these numbers into a list. This function does exactly that with digital data stored in files.

In [1]:
import csv
# Simplified function to read data from a file and return it as a list of rows
def read_data_from_file(filename):
    data = []  # Initialize an empty list to store the data
    with open(filename, 'r') as file:
        csv_reader = csv.reader(file)  # Create a CSV reader object
        for row in csv_reader:  # Iterate over each row in the CSV file
            data.append(row)  # Add the row to our list of data
    return data


## Structuring the Data Collection Process

We gather all our data. It's like collecting all the papers from different rooms (activities, subjects, segments) and stacking them together in order. The loop goes through every possible combination of activity, subject, and segment, reads the respective file, and then adds it to our big collection of data.

In [2]:
# Initialize an empty list to store all the collected data
collected_data = []

# Simplify the file path construction and data collection process
for activity_num in range(1, 20):  # Loop through activity numbers
    for person_num in range(1, 9):  # Loop through person numbers
        for session_num in range(1, 61):  # Loop through session numbers
            # Construct the file path based on the current activity, person, and session
            filename = f"D:/daily+and+sports+activities/data/a{activity_num:02d}/p{person_num}/s{session_num:02d}.txt"
            # Read the data from the current file
            file_data = read_data_from_file(filename)
            # Prepend the activity identifier to the data
            file_data_with_activity = [[f"A{activity_num}"]] + file_data
            # Add the processed data to our collection
            collected_data.append(file_data_with_activity)


## Data Aggregation

Here, the data that was collected and structured is now flattened and prepared for analysis. Flattening is like taking those stacks of papers and putting all the numbers into one big spreadsheet. Then, this spreadsheet is saved to a new file.

In [3]:
# Flatten the collected data into a format suitable for writing to a file
flattened_data = []
for data_group in collected_data:
    flattened_group = [item for sublist in data_group for item in sublist]
    flattened_data.append(flattened_group)


## Writing Data to a File

Open a file named 'sportsdata.txt' and get ready to write in it.

Here, we're preparing the first row of our file, which is called the header.It tells anyone reading the file what each column represents.

We start with "Activity" because that's what we're trying to understand or predict from our data.Then, 

we loop through numbers 1 to 125 (range(1, 126)) because we have 125 rows of data for each 5-second segment. 
For each sensor location on the body (limb), type of sensor measurement (axis), and spatial dimension (measurement),
we create a unique header(This part is a bit like labeling jars in a spice rack so we know exactly what's inside each one).


Limb codes: "T" for torso, "RA" for right arm, left arm (LA), right leg (RL), left leg (LL).
Axis types: "acc" for accelerometer, "gryo" for gyroscope, and "mag" for magnetometer.
Measurements: "x", "y", and "z" for the three spatial dimensions.

In [4]:
# Open the output file in write mode
with open("sportsdata.txt", "w") as file:
    # Construct the header row
    headers = ["Activity"]
    for L in range(1, 126):
        for limb in ["T", "RA", "LA", "RL", "LL"]:
            for axis in ["acc", "gryo", "mag"]:
                for measurement in ["x", "y", "z"]:
                    headers.append(f"{limb}-{measurement}_{axis} ({L})")
    
    # Write the header row to the file
    file.write(",".join(headers) + "\n")
    
    # Write the data rows to the file
    for data_row in flattened_data:
        file.write(",".join(data_row) + "\n")


Finally, we write our actual data to the file. For each piece of data (data_row) in our processed and flattened list (flattened_data), we turn it into a string like we did with the headers and write it to the file. Each data_row represents one segment of activity data, and we add a newline at the end of each row to keep things organized.


This entire process transforms our structured, multidimensional data into a neatly organized text file, making it easier to load and analyze with tools like pandas or even Excel.

In [5]:
import pandas as pd
# Read data with updated file path
data = pd.read_table("sportsdata.txt", delimiter=",", header=0)
data.head()


Unnamed: 0,Activity,T-x_acc (1),T-y_acc (1),T-z_acc (1),T-x_gryo (1),T-y_gryo (1),T-z_gryo (1),T-x_mag (1),T-y_mag (1),T-z_mag (1),...,RL-z_mag (125),LL-x_acc (125),LL-y_acc (125),LL-z_acc (125),LL-x_gryo (125),LL-y_gryo (125),LL-z_gryo (125),LL-x_mag (125),LL-y_mag (125),LL-z_mag (125)
0,A1,8.1305,1.0349,5.4217,-0.009461,0.001915,-0.003424,-0.78712,-0.069654,0.1573,...,-0.036874,-2.8154,-9.06,2.6025,-0.003904,-0.006729,-0.009789,0.73897,0.30275,-0.056262
1,A1,7.9665,1.1684,5.6755,-0.00573,0.026995,-0.009029,-0.79062,-0.071635,0.13429,...,-0.038551,-2.8233,-9.0757,2.6337,-0.006769,-0.006575,-0.004326,0.74027,0.30192,-0.057155
2,A1,7.8917,1.139,5.698,0.01418,0.028722,-0.009079,-0.79531,-0.06946,0.12447,...,-0.040145,-2.8091,-9.0846,2.6295,-0.000714,-0.002681,0.00477,0.74072,0.30101,-0.057301
3,A1,7.9366,1.1536,5.6318,0.003242,0.029965,0.009111,-0.79292,-0.070358,0.13194,...,-0.041109,-2.8844,-9.0849,2.6298,-0.010604,-0.002827,-0.004194,0.7415,0.30305,-0.055743
4,A1,7.8913,1.1972,5.9082,-0.044333,-0.067467,-0.004235,-0.79592,-0.073174,0.12086,...,-0.039495,-2.8249,-9.1083,2.6322,0.013583,0.01367,0.007613,0.74007,0.30324,-0.055548
