<a href="https://colab.research.google.com/github/BhagwatPriyanka/UK-TUS-Analysis/blob/main/Preprocessing_and_cleaning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# UK Time Use Survey 2014-2015 Cleaning and Preprocessing

This notebook is primarily for cleaning and preprocessing the UK Time Use Survey (CTUR 2014-2015) Files.

In [150]:
import pandas as pd
import numpy as np

**Individual File**

This includes biographical information such as country of birth and citizenship, marital status, education, employment, work hours, net individual income, and receipt of benefits. There is information about voluntary work, and help and services provided to others, participation in leisure activities, general health, and enjoyment of activities, subjective wellbeing, and life satisfaction. There is information on the use of non-parental care for each child 0-14 years, and individuals’ caring responsibilities.

In [151]:
individual = pd.read_csv("/content/individual.csv", usecols=["serial", "pnum", "strata", "psu","ind_wt","WorkSta","DMSex","MarStat","DVAge","Citizen1","Relate2"],na_values=" ")

The first few records of individual data looks like this:

In [152]:
individual.head()

Unnamed: 0,serial,strata,psu,pnum,ind_wt,DMSex,WorkSta,DVAge,MarStat,Relate2,Citizen1
0,11010903,-2,-2,1,,1,4,80,3,1,1
1,11010903,-2,-2,2,,2,4,71,3,0,1
2,11010904,-2,-2,1,,2,2,55,3,1,1
3,11010904,-2,-2,2,,1,2,62,3,0,1
4,11010906,-2,-2,1,,2,2,52,3,1,1


In [153]:
# unique id for each household
individual.rename({"serial": "Household id"}, axis=1, inplace=True)

# DMSex is Gender from household grid
# 1 Male 
# 2 Female

individual.rename({"DMSex": "Gender from household grid"}, axis=1, inplace=True)

individual["Gender from household grid"] = individual["Gender from household grid"].map({1: "Male", 2: "Female"})

# WorkSta is major industry code
individual.rename({"WorkSta": "Economic activity status"}, axis=1, inplace=True)

individual["Economic activity status"] = individual["Economic activity status"].map(
    {
        1: "Self employed",
        2: "In paid employment (full or part-time)",
        3: "Unemployed",
        4: "Retired",
        5: "On maternity leave",
        6: "Looking after family or home",
        7: "Full-time student",
        8: "Long-term sick or disabled",
        9: "On a government training scheme",
        10: "Unpaid worker in family business",
        97: "Doing something else",
        -1: "Item not applicable",
        -9: "No answer/refused",
    }
)

# MarStat is Marital status

individual.rename({"MarStat": "Marital status"}, axis=1, inplace=True)

individual["Marital status"] = individual["Marital status"].map(
    {
        1: "Single, never married",
        2: "Cohabiting / living together",
        3: "Married and living with your/his/her husband/wife",
        4: "A civil partner in a legally recognised same-sex Civil Partn",
        5: "Separated, but still legally married",
        6: "Divorced",
        7: "Widowed",
        8: "Separated, but still legally in a same-sex civil partnership",
        -9: "No answer/refused",
        -8: "Don't know",
    }
)

# Citizen1 is Citizenship: UK citizen

individual.rename({"Citizen1": "Citizenship: UK citizen"}, axis=1, inplace=True)

individual["Citizenship: UK citizen"] = individual["Citizenship: UK citizen"].map(
    {
        0: "No",
        1: "Yes",
        -1: "Item not applicable",
        -9: "No answer/refused",      
    }
)

# Relate1 is How related to person 2
individual.rename({"Relate2": "How related to person 2"}, axis=1, inplace=True)

individual["How related to person 2"] = individual["How related to person 2"].map(
    {
        1: "Spouse",
        2: "Civil Partner",
        3: "Cohabiting partner",
        4: "Son/daughter (incl. adopted)",
        5: "Step-son/daughter",
        6: "Foster child",
        7: "Son-in-law/daughter-in-law",
        8: "Parent/guardian",
        9: "Step-parent",
        10: "Foster parent",
        11: "Parent-in-law",
        12: "Brother/sister (incl. adopted)",
        13: "Step-brother/sister",
        14: "Foster brother/sister",
        15: "Brother/sister-in-law",
        16: "Grandchild",
        17: "Grandparent",
        18: "Other relative",
        19: "Other non-relative",
        -9: "No answer/refused",
        -8: "Don't know",  
    }
)

# DVAge is Age.

# ind_wt is weight at the individual level

individual.rename({"ind_wt": "weight at the individual level"}, axis=1, inplace=True)

individual.head(15)

Unnamed: 0,Household id,strata,psu,pnum,weight at the individual level,Gender from household grid,Economic activity status,DVAge,Marital status,How related to person 2,Citizenship: UK citizen
0,11010903,-2,-2,1,,Male,Retired,80,Married and living with your/his/her husband/wife,Spouse,Yes
1,11010903,-2,-2,2,,Female,Retired,71,Married and living with your/his/her husband/wife,,Yes
2,11010904,-2,-2,1,,Female,In paid employment (full or part-time),55,Married and living with your/his/her husband/wife,Spouse,Yes
3,11010904,-2,-2,2,,Male,In paid employment (full or part-time),62,Married and living with your/his/her husband/wife,,Yes
4,11010906,-2,-2,1,,Female,In paid employment (full or part-time),52,Married and living with your/his/her husband/wife,Spouse,Yes
5,11010906,-2,-2,2,,Male,In paid employment (full or part-time),48,Married and living with your/his/her husband/wife,,Yes
6,11010906,-2,-2,3,,Female,Full-time student,18,,Son/daughter (incl. adopted),Item not applicable
7,11010907,-2,-2,1,,Male,In paid employment (full or part-time),36,Married and living with your/his/her husband/wife,Spouse,Yes
8,11010907,-2,-2,2,,Female,Looking after family or home,37,Married and living with your/his/her husband/wife,,Yes
9,11010907,-2,-2,3,,Female,Item not applicable,1,,Son/daughter (incl. adopted),Item not applicable


Replace all -1 (missing values) with NaN

In [154]:
individual = individual.replace(-1, np.NaN)

The Individual File contains information about all the household family members, but we are interested only in the respondent, thus for each unique household id, consider only the first row, since that is information about the respondent.

In [155]:
# interested only in respondent's data
# respondent is the first entry per unique household id
individual.drop_duplicates(subset="Household id", keep="first", inplace=True)

First few rows, after processing the individual data is as follows:

In [156]:
individual.head(20)

Unnamed: 0,Household id,strata,psu,pnum,weight at the individual level,Gender from household grid,Economic activity status,DVAge,Marital status,How related to person 2,Citizenship: UK citizen
0,11010903,-2,-2,1,,Male,Retired,80,Married and living with your/his/her husband/wife,Spouse,Yes
2,11010904,-2,-2,1,,Female,In paid employment (full or part-time),55,Married and living with your/his/her husband/wife,Spouse,Yes
4,11010906,-2,-2,1,,Female,In paid employment (full or part-time),52,Married and living with your/his/her husband/wife,Spouse,Yes
7,11010907,-2,-2,1,,Male,In paid employment (full or part-time),36,Married and living with your/his/her husband/wife,Spouse,Yes
10,11010908,-2,-2,1,,Male,Retired,67,Divorced,,Yes
11,11010911,-2,-2,1,,Male,Unemployed,21,Cohabiting / living together,Cohabiting partner,Yes
13,11010912,-2,-2,1,,Male,Looking after family or home,49,Divorced,Parent/guardian,Yes
15,11010917,-2,-2,1,,Female,In paid employment (full or part-time),42,Cohabiting / living together,Cohabiting partner,Yes
17,11010918,-2,-2,1,,Female,Long-term sick or disabled,42,"Separated, but still legally married",Parent/guardian,Yes
20,11010919,-2,-2,1,,Female,Retired,70,Married and living with your/his/her husband/wife,Spouse,Yes


**Household File**

The household file contains data collected in the household interview. This includes information from the household grid providing information on the gender, age, paid work status, and relationship status of every member of the household. There is information on the household conditions, possessions, net household income from all sources, and information about the help or services households receiv

In [157]:
household = pd.read_csv("/content/household.csv", usecols=["serial", "strata", "psu","hh_wt","Income","NumAdult","NumChild","DVHsize","VehNum"],na_values=" ")

The first few records of household data looks like this:

In [158]:
household.head()

Unnamed: 0,serial,strata,psu,hh_wt,NumAdult,NumChild,DVHsize,VehNum,Income
0,11010903,-2,-2,,2,0,2,2,3000
1,11010904,-2,-2,,2,0,2,2,-9
2,11010906,-2,-2,,3,0,3,4,3200
3,11010907,-2,-2,,2,1,3,1,1100
4,11010908,-2,-2,,1,0,1,2,-9


In [159]:
# unique id for each household
household.rename({"serial": "Household id"}, axis=1, inplace=True)

# NumAdult is Number of adults in household

household.rename({"NumAdult": "Number of adults in household"}, axis=1, inplace=True)

# NumChild is Number of children in household
household.rename({"NumChild": "Number of children in household"}, axis=1, inplace=True)

# DVHsize is Number of people in Household

household.rename({"DVHsize": "Number of people in Household"}, axis=1, inplace=True)

# VehNum is Number of cars or vans

household.rename({"VehNum": "Number of cars or vans"}, axis=1, inplace=True)

# Income is Total monthly household income.
household.rename({"Income": "Total monthly household income"}, axis=1, inplace=True)

In [160]:
household.head(15)

Unnamed: 0,Household id,strata,psu,hh_wt,Number of adults in household,Number of children in household,Number of people in Household,Number of cars or vans,Total monthly household income
0,11010903,-2,-2,,2,0,2,2,3000
1,11010904,-2,-2,,2,0,2,2,-9
2,11010906,-2,-2,,3,0,3,4,3200
3,11010907,-2,-2,,2,1,3,1,1100
4,11010908,-2,-2,,1,0,1,2,-9
5,11010911,-2,-2,,2,0,2,2,120
6,11010912,-2,-2,,1,1,2,2,1500
7,11010917,-2,-2,,2,0,2,1,1900
8,11010918,-2,-2,,2,1,3,-1,-8
9,11010919,-2,-2,,2,0,2,1,2000


**Merging Multiple Files :**


We are going to merge multiple files into two final csv files, one contains respondent information, other contains information about the activities performed by respondent.

1. **Respondent Data**
Combining Respondent Data with Household data using the serial id(household id).

In [161]:
# respondent + household data
respondentCleanedData = individual.merge(household, on=["Household id","strata","psu"])

In [162]:
respondentCleanedData.head(20)

Unnamed: 0,Household id,strata,psu,pnum,weight at the individual level,Gender from household grid,Economic activity status,DVAge,Marital status,How related to person 2,Citizenship: UK citizen,hh_wt,Number of adults in household,Number of children in household,Number of people in Household,Number of cars or vans,Total monthly household income
0,11010903,-2,-2,1,,Male,Retired,80,Married and living with your/his/her husband/wife,Spouse,Yes,,2,0,2,2,3000
1,11010904,-2,-2,1,,Female,In paid employment (full or part-time),55,Married and living with your/his/her husband/wife,Spouse,Yes,,2,0,2,2,-9
2,11010906,-2,-2,1,,Female,In paid employment (full or part-time),52,Married and living with your/his/her husband/wife,Spouse,Yes,,3,0,3,4,3200
3,11010907,-2,-2,1,,Male,In paid employment (full or part-time),36,Married and living with your/his/her husband/wife,Spouse,Yes,,2,1,3,1,1100
4,11010908,-2,-2,1,,Male,Retired,67,Divorced,,Yes,,1,0,1,2,-9
5,11010911,-2,-2,1,,Male,Unemployed,21,Cohabiting / living together,Cohabiting partner,Yes,,2,0,2,2,120
6,11010912,-2,-2,1,,Male,Looking after family or home,49,Divorced,Parent/guardian,Yes,,1,1,2,2,1500
7,11010917,-2,-2,1,,Female,In paid employment (full or part-time),42,Cohabiting / living together,Cohabiting partner,Yes,,2,0,2,1,1900
8,11010918,-2,-2,1,,Female,Long-term sick or disabled,42,"Separated, but still legally married",Parent/guardian,Yes,,2,1,3,-1,-8
9,11010919,-2,-2,1,,Female,Retired,70,Married and living with your/his/her husband/wife,Spouse,Yes,,2,0,2,1,2000


In [163]:
respondentCleanedData.shape

(4721, 17)

In [164]:
respondentCleanedData.isna().sum()

Household id                          0
strata                                0
psu                                   0
pnum                                  0
weight at the individual level      505
Gender from household grid            0
Economic activity status              0
DVAge                                 0
Marital status                       10
How related to person 2            1440
Citizenship: UK citizen               0
hh_wt                               503
Number of adults in household         0
Number of children in household       0
Number of people in Household         0
Number of cars or vans                0
Total monthly household income        0
dtype: int64

In [165]:
respondentCleanedData.dropna()

Unnamed: 0,Household id,strata,psu,pnum,weight at the individual level,Gender from household grid,Economic activity status,DVAge,Marital status,How related to person 2,Citizenship: UK citizen,hh_wt,Number of adults in household,Number of children in household,Number of people in Household,Number of cars or vans,Total monthly household income
11,11011202,110,117,1,0.828011,Female,In paid employment (full or part-time),48,Married and living with your/his/her husband/wife,Spouse,Yes,0.824717,3,1,4,4,3000
12,11011203,110,117,1,1.134924,Male,Retired,75,Married and living with your/his/her husband/wife,Spouse,Yes,1.193769,2,0,2,2,2500
13,11011207,110,117,1,0.745875,Female,Retired,68,Married and living with your/his/her husband/wife,Spouse,Yes,0.793503,2,0,2,1,1500
14,11011209,110,117,1,0.786025,Male,In paid employment (full or part-time),69,Married and living with your/his/her husband/wife,Spouse,Yes,0.835203,2,0,2,1,1500
15,11011210,110,117,1,0.816703,Female,Looking after family or home,29,Married and living with your/his/her husband/wife,Spouse,Yes,0.849499,2,2,4,2,3500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4710,54051007,108,844,1,0.802824,Female,Self employed,58,Married and living with your/his/her husband/wife,Spouse,Yes,0.835038,2,0,2,1,4000
4711,54051013,108,844,1,0.972675,Female,In paid employment (full or part-time),41,Married and living with your/his/her husband/wife,Spouse,Yes,1.014260,2,0,2,2,2400
4713,54051015,108,844,1,0.756571,Male,Self employed,66,Married and living with your/his/her husband/wife,Spouse,Yes,0.793346,2,0,2,2,3000
4714,55051004,102,845,1,0.866010,Female,In paid employment (full or part-time),41,Married and living with your/his/her husband/wife,Spouse,Yes,0.899909,2,0,2,2,2500


In [166]:
respondentCleanedData.to_csv("/content/respondentCleanedData.csv", index=False)

**Codes:**

The codes file contains all the mapping of code to the action performed.

In [167]:
activityCodes = pd.read_csv("/content/ActivityCodes.csv",encoding='cp1252')

The first few rows of activityCodes looks like this:

In [168]:
activityCodes.head()

Unnamed: 0,code,name
0,0,Unspecified personal care
1,110,Sleep
2,111,In bed not asleep
3,120,Sick in bed
4,210,Eating


We create a dictionary to easily map the code to activity in our UK-TUS dataframe.

In [169]:
# create a dictionary using
activityDictionary = activityCodes.to_dict()["name"]

**Wide Fromat File:**

Information from each field of the time diary is presented in a distinct array of variables comprising the 144 10-minute time slots that make up the entire diary day (from 4am to 4am).

In [170]:
wide= pd.read_csv("/content/wide.csv",na_values=" ",usecols=np.r_[0:4,31:2335],low_memory=False)

In [171]:
wide.head

<bound method NDFrame.head of          serial  strata  psu  pnum  ...  dev141  dev142  dev143  dev144
0      11011202     110  117     1  ...       0       0       0       0
1      11011202     110  117     1  ...       0       0       0       0
2      11011202     110  117     4  ...       0       0       0       0
3      11011202     110  117     4  ...       0       0       0       0
4      11011203     110  117     1  ...       0       0       0       0
...         ...     ...  ...   ...  ...     ...     ...     ...     ...
16528  55051011     102  845     2  ...       0       0       0       0
16529  55051014     102  845     1  ...       0       0       0       0
16530  55051014     102  845     1  ...       0       0       0       0
16531  55051020     102  845     1  ...       0       0       0       0
16532  55051020     102  845     1  ...       0       0       0       0

[16533 rows x 2308 columns]>

In [172]:
wide.isna().sum()

serial    0
strata    0
psu       0
pnum      0
act1_1    0
         ..
dev140    0
dev141    0
dev142    0
dev143    0
dev144    0
Length: 2308, dtype: int64

In [173]:
wide.dropna()

Unnamed: 0,serial,strata,psu,pnum,act1_1,act1_2,act1_3,act1_4,act1_5,act1_6,act1_7,act1_8,act1_9,act1_10,act1_11,act1_12,act1_13,act1_14,act1_15,act1_16,act1_17,act1_18,act1_19,act1_20,act1_21,act1_22,act1_23,act1_24,act1_25,act1_26,act1_27,act1_28,act1_29,act1_30,act1_31,act1_32,act1_33,act1_34,act1_35,act1_36,...,dev105,dev106,dev107,dev108,dev109,dev110,dev111,dev112,dev113,dev114,dev115,dev116,dev117,dev118,dev119,dev120,dev121,dev122,dev123,dev124,dev125,dev126,dev127,dev128,dev129,dev130,dev131,dev132,dev133,dev134,dev135,dev136,dev137,dev138,dev139,dev140,dev141,dev142,dev143,dev144
0,11011202,110,117,1,110,110,110,110,110,110,110,110,110,110,110,8219,310,3210,3110,7241,210,3819,210,210,210,3310,3210,3210,3210,3210,3110,3110,3110,3110,7259,5140,5140,5140,5140,5140,...,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,11011202,110,117,1,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,111,310,3110,3310,3310,210,210,210,210,3310,3310,7241,7241,7241,7241,3430,3430,3210,3210,3210,3210,...,1,1,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,11011202,110,117,4,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,310,210,310,310,9210,9210,9210,9210,9210,9210,9210,2110,2110,2110,2110,2110,2110,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,11011202,110,117,4,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,111,111,111,3110,7330,7330,...,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,11011203,110,117,1,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,111,3110,5120,5120,210,5120,5120,5120,210,300,300,300,300,3710,...,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16528,55051011,102,845,2,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
16529,55051014,102,845,1,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,310,310,210,210,210,3310,3130,5140,3310,3310,3210,5310,5310,3610,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
16530,55051014,102,845,1,111,111,111,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,310,210,5140,5110,310,310,310,5110,5140,5140,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
16531,55051020,102,845,1,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,310,310,3110,9110,9110,9110,9110,9110,9110,9110,9110,9110,9110,9110,9110,9110,9110,9110,...,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [174]:
# unique id for each household
wide.rename({"serial": "Household id"}, axis=1, inplace=True)

In [175]:
# respondent is the first entry per unique household id
wide.drop_duplicates(subset="pnum", keep="first", inplace=True)

In [176]:
wide.head()

Unnamed: 0,Household id,strata,psu,pnum,act1_1,act1_2,act1_3,act1_4,act1_5,act1_6,act1_7,act1_8,act1_9,act1_10,act1_11,act1_12,act1_13,act1_14,act1_15,act1_16,act1_17,act1_18,act1_19,act1_20,act1_21,act1_22,act1_23,act1_24,act1_25,act1_26,act1_27,act1_28,act1_29,act1_30,act1_31,act1_32,act1_33,act1_34,act1_35,act1_36,...,dev105,dev106,dev107,dev108,dev109,dev110,dev111,dev112,dev113,dev114,dev115,dev116,dev117,dev118,dev119,dev120,dev121,dev122,dev123,dev124,dev125,dev126,dev127,dev128,dev129,dev130,dev131,dev132,dev133,dev134,dev135,dev136,dev137,dev138,dev139,dev140,dev141,dev142,dev143,dev144
0,11011202,110,117,1,110,110,110,110,110,110,110,110,110,110,110,8219,310,3210,3110,7241,210,3819,210,210,210,3310,3210,3210,3210,3210,3110,3110,3110,3110,7259,5140,5140,5140,5140,5140,...,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,11011202,110,117,4,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,310,210,310,310,9210,9210,9210,9210,9210,9210,9210,2110,2110,2110,2110,2110,2110,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,11011207,110,117,2,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,310,3210,3110,8100,8100,3620,3710,3710,3710,3710,3710,210,210,210,210,310,8100,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
26,11011212,110,117,3,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,111,7250,7250,7250,7250,7250,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
214,11050111,123,143,5,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,111,210,210,210,210,210,210,210,210,210,210,7330,7330,7330,7330,7330,7330,7330,...,-9,-9,0,0,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9


In [177]:
serial_group_wide = wide.groupby(['Household id', 'pnum'])
  
# Print the first value in each group
serial_group_wide.first()

Unnamed: 0_level_0,Unnamed: 1_level_0,strata,psu,act1_1,act1_2,act1_3,act1_4,act1_5,act1_6,act1_7,act1_8,act1_9,act1_10,act1_11,act1_12,act1_13,act1_14,act1_15,act1_16,act1_17,act1_18,act1_19,act1_20,act1_21,act1_22,act1_23,act1_24,act1_25,act1_26,act1_27,act1_28,act1_29,act1_30,act1_31,act1_32,act1_33,act1_34,act1_35,act1_36,act1_37,act1_38,...,dev105,dev106,dev107,dev108,dev109,dev110,dev111,dev112,dev113,dev114,dev115,dev116,dev117,dev118,dev119,dev120,dev121,dev122,dev123,dev124,dev125,dev126,dev127,dev128,dev129,dev130,dev131,dev132,dev133,dev134,dev135,dev136,dev137,dev138,dev139,dev140,dev141,dev142,dev143,dev144
Household id,pnum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1
11011202,1,110,117,110,110,110,110,110,110,110,110,110,110,110,8219,310,3210,3110,7241,210,3819,210,210,210,3310,3210,3210,3210,3210,3110,3110,3110,3110,7259,5140,5140,5140,5140,5140,7259,7259,...,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
11011202,4,110,117,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,310,210,310,310,9210,9210,9210,9210,9210,9210,9210,2110,2110,2110,2110,2110,2110,2110,2110,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
11011207,2,110,117,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,310,3210,3110,8100,8100,3620,3710,3710,3710,3710,3710,210,210,210,210,310,8100,9370,3620,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
11011212,3,110,117,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,111,7250,7250,7250,7250,7250,310,3110,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
11050111,5,123,143,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,111,210,210,210,210,210,210,210,210,210,210,7330,7330,7330,7330,7330,7330,7330,7330,7330,...,-9,-9,0,0,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
11221215,6,101,101,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
12080917,7,148,189,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,111,310,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
12201006,8,151,195,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,310,310,210,8210,8210,6120,6120,6120,6120,6120,310,310,310,310,...,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [178]:
#activities = wide.loc[:,'act1_1':'act1_144']
#activities.head()

In [179]:
#other_activity1 = wide.loc[:,'othact1_1':'othact1_144']
#other_activity1.head()

In [180]:
#other_activity2 = wide.loc[:,'othact2_1':'othact2_144']
#other_activity2.head()

In [181]:
#other_activity3 = wide.loc[:,'othact3_1':'othact3_144']
#other_activity3.head()

In [182]:
#location_activity= wide.loc[:,'wher_1':'wher_144']
#location_activity.head()

In [183]:
#Activity_alone= wide.loc[:,'wit0_1':'wit0_144']
#Activity_alone.head()

In [184]:
#Activity_with_spouse= wide.loc[:,'wit1_1':'wit1_144']
#Activity_with_spouse.head()

In [185]:
#Activity_with_mother= wide.loc[:,'wit2_1':'wit2_144']
#Activity_with_mother.head()

In [186]:
#Activity_with_father= wide.loc[:,'wit3_1':'wit3_144']
#Activity_with_father.head()

In [187]:
#Activity_with_child_0_7= wide.loc[:,'wit4_1':'wit4_144']
#Activity_with_child_0_7.head()

In [188]:
#Activity_with_child_above8= wide.loc[:,'wit5_1':'wit5_144']
#Activity_with_child_above8.head()

In [189]:
#Activity_with_otherperson= wide.loc[:,'wit6_1':'wit6_144']
#Activity_with_otherperson.head()

In [190]:
#Activity_with_No_copresence = wide.loc[:,'wit7_1':'wit7_144']
#Activity_with_No_copresence.head()

In [191]:
#Activity_with_Sleep_Work_Education= wide.loc[:,'wit8_1':'wit8_144']
#Activity_with_Sleep_Work_Education.head()

In [192]:
#Enjoyment_activity= wide.loc[:,'enj1':'enj144']
#Enjoyment_activity.head()

In [193]:
#Device_use_in_activity= wide.loc[:,'dev1':'dev144']
#Device_use_in_activity.head()

In [194]:
episode = pd.read_csv("/content/episode.csv", na_values= " ",usecols=np.r_[0:4,32:49])
episode.head()

Unnamed: 0,serial,strata,psu,pnum,eptime,whatdoing,What_Oth1,What_Oth2,What_Oth3,WhereWhen,Device,WithAlone,WithSpouse,WithMother,WithFather,WithChild,WithOther,WithOtherYK,WithMiss,WithNA,Enjoy
0,11011202,110,117,1,110,110,-9,-9,-9,11,0,0,1,0,0,0,1,0,0,1,7
1,11011202,110,117,1,10,8219,111,-9,-9,11,0,0,1,0,0,0,1,0,0,0,7
2,11011202,110,117,1,10,310,-9,-9,-9,11,0,0,0,0,0,0,0,0,1,0,3
3,11011202,110,117,1,10,3210,-9,-9,-9,11,0,0,0,0,0,0,0,0,1,0,3
4,11011202,110,117,1,10,3110,-9,-9,-9,11,1,0,0,0,0,0,0,0,1,0,5


In [195]:
# unique id for each household
episode.rename({"serial": "Household id"}, axis=1, inplace=True)

In [196]:
serial_group_ep = episode.groupby(['Household id', 'pnum'])
  
# Print the first value in each group
serial_group_ep.first()

Unnamed: 0_level_0,Unnamed: 1_level_0,strata,psu,eptime,whatdoing,What_Oth1,What_Oth2,What_Oth3,WhereWhen,Device,WithAlone,WithSpouse,WithMother,WithFather,WithChild,WithOther,WithOtherYK,WithMiss,WithNA,Enjoy
Household id,pnum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
11011202,1,110,117,110,110,-9,-9,-9,11,0,0,1,0,0,0,1,0,0,1,7
11011202,4,110,117,190,110,-9,-9,-9,11,0,0,0,0,0,0,0,0,1,1,7
11011203,1,110,117,210,110,-9,-9,-9,11,0,0,1,0,0,0,0,1,0,1,7
11011207,1,110,117,10,110,-9,-9,-9,11,0,1,0,0,0,0,0,0,0,1,7
11011207,2,110,117,10,110,-9,-9,-9,11,0,1,0,0,0,0,0,0,0,1,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
55051004,2,102,845,10,110,-9,-9,-9,11,0,0,1,0,0,0,0,0,0,1,6
55051011,1,102,845,10,310,390,-9,-9,11,0,0,0,0,0,0,1,0,0,0,6
55051011,2,102,845,10,110,-9,-9,-9,11,0,0,1,0,0,0,0,0,0,1,7
55051014,1,102,845,180,110,-9,-9,-9,11,0,0,1,0,0,0,1,0,0,1,4


In [197]:
episode.shape

(587632, 21)

In [198]:
#episode.drop_duplicates(subset="serial", keep="first", inplace=True)

In [199]:
#episode.head()

In [200]:
episode.isna().sum()

Household id    0
strata          0
psu             0
pnum            0
eptime          0
whatdoing       0
What_Oth1       0
What_Oth2       0
What_Oth3       0
WhereWhen       0
Device          0
WithAlone       0
WithSpouse      0
WithMother      0
WithFather      0
WithChild       0
WithOther       0
WithOtherYK     0
WithMiss        0
WithNA          0
Enjoy           0
dtype: int64

**Merging files**

**2. Activity Data**

Combining activity (wide format) data with episode format data to get information about who was with respondent when activity was being performed. The merge is done using unique household id id and person number.

In [201]:
# wide + episode data
activityCleanedData = wide.merge(episode, on=["Household id","strata","psu","pnum"])

In [202]:
serial_group_act = activityCleanedData.groupby(['Household id', 'pnum'])
  
# Print the first value in each group
serial_group_act.first()

Unnamed: 0_level_0,Unnamed: 1_level_0,strata,psu,act1_1,act1_2,act1_3,act1_4,act1_5,act1_6,act1_7,act1_8,act1_9,act1_10,act1_11,act1_12,act1_13,act1_14,act1_15,act1_16,act1_17,act1_18,act1_19,act1_20,act1_21,act1_22,act1_23,act1_24,act1_25,act1_26,act1_27,act1_28,act1_29,act1_30,act1_31,act1_32,act1_33,act1_34,act1_35,act1_36,act1_37,act1_38,...,dev122,dev123,dev124,dev125,dev126,dev127,dev128,dev129,dev130,dev131,dev132,dev133,dev134,dev135,dev136,dev137,dev138,dev139,dev140,dev141,dev142,dev143,dev144,eptime,whatdoing,What_Oth1,What_Oth2,What_Oth3,WhereWhen,Device,WithAlone,WithSpouse,WithMother,WithFather,WithChild,WithOther,WithOtherYK,WithMiss,WithNA,Enjoy
Household id,pnum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1
11011202,1,110,117,110,110,110,110,110,110,110,110,110,110,110,8219,310,3210,3110,7241,210,3819,210,210,210,3310,3210,3210,3210,3210,3110,3110,3110,3110,7259,5140,5140,5140,5140,5140,7259,7259,...,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,110,110,-9,-9,-9,11,0,0,1,0,0,0,1,0,0,1,7
11011202,4,110,117,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,310,210,310,310,9210,9210,9210,9210,9210,9210,9210,2110,2110,2110,2110,2110,2110,2110,2110,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,190,110,-9,-9,-9,11,0,0,0,0,0,0,0,0,1,1,7
11011207,2,110,117,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,310,3210,3110,8100,8100,3620,3710,3710,3710,3710,3710,210,210,210,210,310,8100,9370,3620,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,110,-9,-9,-9,11,0,1,0,0,0,0,0,0,0,1,7
11011212,3,110,117,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,111,7250,7250,7250,7250,7250,310,3110,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,300,110,-9,-9,-9,11,0,1,0,0,0,0,0,0,0,1,7
11050111,5,123,143,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,111,210,210,210,210,210,210,210,210,210,210,7330,7330,7330,7330,7330,7330,7330,7330,7330,...,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,180,110,-9,-9,-9,11,-9,0,0,0,0,0,0,0,1,1,-9
11221215,6,101,101,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,390,110,-9,-9,-9,11,0,1,0,0,0,0,0,0,0,1,6
12080917,7,148,189,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,111,310,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,360,110,-9,-9,-9,11,0,1,0,0,0,0,0,0,0,1,7
12201006,8,151,195,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,310,310,210,8210,8210,6120,6120,6120,6120,6120,310,310,310,310,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,240,110,-9,-9,-9,11,0,1,0,0,0,0,0,0,0,1,7


In [203]:
# add weightage from respondent data
activityCleanedData = activityCleanedData.merge(
    individual[["Household id", "weight at the individual level"]], on="Household id"
)

In [204]:
activityCleanedData.head()

Unnamed: 0,Household id,strata,psu,pnum,act1_1,act1_2,act1_3,act1_4,act1_5,act1_6,act1_7,act1_8,act1_9,act1_10,act1_11,act1_12,act1_13,act1_14,act1_15,act1_16,act1_17,act1_18,act1_19,act1_20,act1_21,act1_22,act1_23,act1_24,act1_25,act1_26,act1_27,act1_28,act1_29,act1_30,act1_31,act1_32,act1_33,act1_34,act1_35,act1_36,...,dev123,dev124,dev125,dev126,dev127,dev128,dev129,dev130,dev131,dev132,dev133,dev134,dev135,dev136,dev137,dev138,dev139,dev140,dev141,dev142,dev143,dev144,eptime,whatdoing,What_Oth1,What_Oth2,What_Oth3,WhereWhen,Device,WithAlone,WithSpouse,WithMother,WithFather,WithChild,WithOther,WithOtherYK,WithMiss,WithNA,Enjoy,weight at the individual level
0,11011202,110,117,1,110,110,110,110,110,110,110,110,110,110,110,8219,310,3210,3110,7241,210,3819,210,210,210,3310,3210,3210,3210,3210,3110,3110,3110,3110,7259,5140,5140,5140,5140,5140,...,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,110,110,-9,-9,-9,11,0,0,1,0,0,0,1,0,0,1,7,0.828011
1,11011202,110,117,1,110,110,110,110,110,110,110,110,110,110,110,8219,310,3210,3110,7241,210,3819,210,210,210,3310,3210,3210,3210,3210,3110,3110,3110,3110,7259,5140,5140,5140,5140,5140,...,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,8219,111,-9,-9,11,0,0,1,0,0,0,1,0,0,0,7,0.828011
2,11011202,110,117,1,110,110,110,110,110,110,110,110,110,110,110,8219,310,3210,3110,7241,210,3819,210,210,210,3310,3210,3210,3210,3210,3110,3110,3110,3110,7259,5140,5140,5140,5140,5140,...,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,310,-9,-9,-9,11,0,0,0,0,0,0,0,0,1,0,3,0.828011
3,11011202,110,117,1,110,110,110,110,110,110,110,110,110,110,110,8219,310,3210,3110,7241,210,3819,210,210,210,3310,3210,3210,3210,3210,3110,3110,3110,3110,7259,5140,5140,5140,5140,5140,...,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,3210,-9,-9,-9,11,0,0,0,0,0,0,0,0,1,0,3,0.828011
4,11011202,110,117,1,110,110,110,110,110,110,110,110,110,110,110,8219,310,3210,3110,7241,210,3819,210,210,210,3310,3210,3210,3210,3210,3110,3110,3110,3110,7259,5140,5140,5140,5140,5140,...,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,3110,-9,-9,-9,11,1,0,0,0,0,0,0,0,1,0,5,0.828011


In [205]:
activityCleanedData.to_csv("/content/activityDataCleaned.csv", index=False)

We now perform analysis.