In [144]:
import json
import numpy as np
from tqdm import tqdm
import re

In [145]:
q_regex = re.compile(r"^\d{1,2}\. |^\W{0,5}Question\s?\d{0,2}\W{0,5}|^\W{0,5}Q\s?\d{1,2}\W{0,5}")
a_regex = re.compile(r"^\d{0,2}\W{0,5}Answer\s?\d{0,2}\W{0,5}|^-|^\d{0,2}\W{0,5}A\s?\d{0,2}\W{0,5}:")


def tuplize_qa_pair(q, a):
    a=a.strip()
    if len(a):
        if a_regex.match(a):
            a_preface, a_content = a_regex.split(a,1)
        elif a[0].isalpha():
            a_content = a
        else:
            raise RuntimeError(f"Answer regex incomplete: {q}\nANSWER:{a}")
        
        a_content = a_content.strip()
        if len(a_content)<10:
            print(f"Small answer dropping: {a_content}")
            return None
        return (q, a_content)
    else:
        return None


def one_example(sample):
    sample_dict = json.loads(sample)
    qa_lines = sample_dict["messages"][2]["content"].split("\n")
    qa_pair_list = []
    q_prefix = ''
    q_content = ''
    a = ''
    missing = 0
    for line in qa_lines:
        line = line.strip()
        if q_regex.match(line) and not a_regex.match(line):
            old_q_prefix = q_prefix
            old_q_content = q_content
            assert len(q_regex.findall(line))==1, q_regex.findall(line)
            q_prefix = q_regex.findall(line)[0]
            q_content  = q_regex.split(line,1)[1]
            if q_prefix == old_q_prefix and re.match(r"\d",q_prefix):
                # Answer recovery when q and a are both 1. 2. etc
                a+=q_content+"\n"
                q_content = old_q_content
            else:
                qa_pair = tuplize_qa_pair(q_content, a)
                if qa_pair is not None:
                    qa_pair_list.append(qa_pair)
                a = ''
        else:
            a+=line+'\n'
    qa_pair = tuplize_qa_pair(q_content, a)
    if qa_pair is not None:
        qa_pair_list.append(qa_pair)
    return qa_pair_list, 10-len(qa_pair_list)

In [146]:
with open("opensqa_10hz_0-13150.jsonl","r") as f:
    samples = f.readlines()
f.close()

In [147]:
all_qa_pair = []
p = 0
for sample in tqdm(samples):
    qa_pairs, missing = one_example(sample)
    assert missing>=0, f"Extra: {qa_pairs} Missing: {missing}"
    if missing>5:
        print(f"{missing} missing:{json.loads(sample)['messages'][2]['content']}")
        print(qa_pairs)
        if missing<10:
            print("Potential corner case")
    else:
        all_qa_pair.extend(qa_pairs)
    p+=missing
p

  0%|                                                      | 0/13150 [00:00<?, ?it/s]

 25%|█████████▊                              | 3226/13150 [00:00<00:00, 16046.82it/s]

Small answer dropping: 
10 missing:1. How can you interpret the fluctuation in the gyroscope data during the stair climbing activity?
2. Based on the accelerometer data pattern, what can you infer about the user's motion while ascending the stairs?
3. Is there a correlation between the initial slight increase in acceleration and the user's motion at the start of stair climbing?
4. How do the fluctuations in acceleration values in the accelerometer data correlate with the changes in speed or step height during the stair climbing activity?
5. What do the noticeable increase in acceleration towards the end of the data signify about the user's position on the stairs?
6. How do the gyroscope readings indicating minor body orientation adjustments relate to the user's balance during the stair climbing activity?
7. Can you explain how the consistent changes in acceleration values align with the user's movement up the stairs?
8. How does the rhythmic pattern of the accelerometer data reflect th

 61%|████████████████████████▍               | 8038/13150 [00:00<00:00, 16002.54it/s]

Small answer dropping: 
10 missing:1. Based on the gyroscopic data provided, can you infer if the biker made any sudden turns or quick maneuvers during the biking activity? Explain your reasoning.
2. How do the patterns in the accelerometer data align with the typical motion of biking, particularly in terms of acceleration and deceleration? Provide detailed insights.
3. Are there any instances in the gyroscope data that could indicate the biker came to a complete stop during the biking activity? Justify your answer with the data provided.
4. Given the consistency in both the gyroscopic and accelerometer data, can you deduce if the terrain for this biking activity was mainly flat or if there were significant inclines encountered? Support your response with data analysis.
5. How do the gyroscopic readings reflect the different biking phases mentioned in the narration, such as uphill climbs or downhill descents? Provide a detailed analysis.
6. Is it possible to determine the speed of the 

 85%|█████████████████████████████████▎     | 11234/13150 [00:00<00:00, 15800.31it/s]

10 missing:1. Given the gyroscope data indicating rhythmic movement, how can we differentiate between the different phases of the walking gait cycle using this information?
2. How do the periodic patterns and varying magnitudes in the accelerometer data help us understand the impact of each step during walking?
3. Can you explain how the transition from a balanced state to fore and aft movements in the gyroscope data is indicative of the walking activity?
4. In what ways do the slight shifts in lateral accelerations captured by the accelerometer data provide insights into the user's movement patterns while walking?
5. How do the gyroscope spikes aligning with abrupt changes in direction or quick movements contribute to understanding the user's walking behavior?
6. Based on the variation in accelerometer values corresponding to step intensity, can we quantify the level of exertion or speed at which the individual is walking?
7. How can the combination of gyroscope and accelerometer data

100%|███████████████████████████████████████| 13150/13150 [00:00<00:00, 15771.49it/s]

Small answer dropping: 
10 missing:1. Can you explain how the gyroscope data reflects the varying angular velocities during walking based on the provided readings?
2. How do the accelerometer readings correspond to the steps taken during walking? Provide a detailed explanation considering the accelerometer data provided.
3. What can the positive and negative values in the gyro data tell us about the different directions of movement during walking? How does this align with the gyroscope readings given?
4. How do the peaks and troughs in the accelerometer data correlate with foot contacts and push-offs during the walking activity described?
5. Based on the gyroscope spikes observed during walking, what insights can you derive about potential turning movements or irregular steps taken during the activity?
6. Can you identify any patterns or anomalies in the gyroscope data that may indicate a specific phase within the walking gait cycle?
7. How do the gyroscope readings in different axes i




2380

In [148]:
len(all_qa_pair)

129118

In [149]:
all_qa_pair[1497]

('How would the data interpretation differ if the user was descending stairs instead of ascending, based on the gyroscope and accelerometer readings?',
 'In the case of descending stairs, the gyroscope data would show decreasing values in the y-axis, indicating downward movement, while the accelerometer data would demonstrate a decrease in z-axis values as the user descends, showcasing a different pattern compared to ascending stairs.')

In [150]:
with open("opensqa_10hz_13150-26288.jsonl","r") as f:
    samples = f.readlines()
f.close()

In [151]:
all_qa_pair = []
p = 0
for sample in tqdm(samples):
    qa_pairs, missing = one_example(sample)
    assert missing>=0, f"Extra: {qa_pairs} Missing: {missing}"
    if missing>5:
        print(f"{missing} missing:{json.loads(sample)['messages'][2]['content']}")
        print(qa_pairs)
        if missing<10:
            print("Potential corner case")
    all_qa_pair+=qa_pairs
    p+=missing
p

  0%|                                                      | 0/13138 [00:00<?, ?it/s]

 25%|█████████▉                              | 3278/13138 [00:00<00:00, 16242.69it/s]

10 missing:1. Can you explain how the gyroscopic data provided aligns with the characteristic features of a walking gait analysis?
2. How do the peaks and valleys in the gyroscopic data correlate with the changes in body orientation during walking?
3. How can the alternating high and low values in the accelerometer data be associated with the user's steps during walking?
4. Based on the combination of gyroscopic and accelerometer data, can you infer the phase of walking where the user experiences the most acceleration variations?
5. Is it possible to determine the user's walking pace or speed solely by analyzing the provided gyroscopic data?
6. How do the variations in acceleration captured by the accelerometer data correspond to the user's changes in pace or direction while walking?
7. Considering the gyroscopic data's steady rhythm and the accelerometer data's consistent acceleration variations, can you identify any instances of abrupt movements or quick turns during the walking acti

 62%|████████████████████████▊               | 8137/13138 [00:00<00:00, 16163.53it/s]

Small answer dropping: 
10 missing:1. Given the gyroscope readings, how do the values in each triplet (x, y, z) axis compare to each other during the sitting activity? What can be inferred from these comparisons?
2. Can the accelerometer data help in determining the placement of the device on the body during the sitting activity? Explain your reasoning based on the sensor readings provided.
3. How do the gyroscope readings indicate the stability of the sitting posture over time? Can you identify any patterns or trends in the gyroscope data that support this?
4. Based on the accelerometer data, can you estimate the gravitational force acting on the sensor during the sitting event? How does this align with the expected gravitational force in a seated posture?
5. Are there any specific characteristics in the gyroscope data that suggest occasional movements or shifts in the seated position? How do these movements manifest in the readings?
6. In the accelerometer data, identify any irregula

 86%|█████████████████████████████████▋     | 11363/13138 [00:00<00:00, 15922.33it/s]

10 missing:1. Based on the gyroscope data indicating rapid changes in all axes and the accelerometer data showing varying acceleration levels, what can we infer about the intensity of the biking activity?
2. How do the high negative y-axis values in the accelerometer data correlate with the gyroscope readings suggesting sharp turns during the biking session?
3. Can you explain how the peaks in the gyroscope z-axis readings aligning with spikes in accelerometer values indicate strong correlations between rapid turns and changes in acceleration during the biking activity?
4. Considering the gyroscope data showing dynamic movement and the accelerometer data revealing different terrains, how do these findings support the user's description of intense biking with sharp turns?
5. How do the gyroscope readings of the sharp maneuvers during biking relate to the accelerometer data showing peaks in acceleration levels?
6. In the context of intense biking with sharp maneuvers, how do the gyroscop

100%|███████████████████████████████████████| 13138/13138 [00:00<00:00, 15968.80it/s]

Small answer dropping: 





3120

In [152]:
all_qa_pair[6497]

('How can the accelerometer data be analyzed to detect any irregularities or anomalies in the walking pattern?',
 'The gyroscope data, specifically along the y-axis, captures the rotational movements of the arms during walking. Consistent shifts in angular velocity along the y-axis correspond to the swinging motion of the arms, providing insight into the overall gait pattern.')

In [153]:
len(all_qa_pair)

128260