# Checkpoint Three: Cleaning Data

Now you are ready to clean your data. Before starting coding, provide the link to your dataset below.

My dataset:
https://beta.bls.gov/dataViewer/view/timeseries/LNS14000000;jsessionid=C18C91F0E4B005625E7BE1D4D99553D5
https://fred.stlouisfed.org/series/IHLCHGNEWUS
https://catalog.data.gov/dataset/mental-health-care-in-the-last-4-weeks
https://beta.bls.gov/dataViewer/view/timeseries/LNS14000025
https://beta.bls.gov/dataViewer/view/timeseries/LNS14000026

Import the necessary libraries and create your dataframe(s).

In [3]:
import pandas as pd

In [9]:
# load data
employment_rate = pd.read_csv("BLS employment rate.csv")
job_postings = pd.read_csv("Indeed job posting freq.csv")
mental_health = pd.read_csv("Mental_Health_Care_in_the_Last_4_Weeks.csv")
unemployment_for_men = pd.read_csv("Unemployment rate for men.csv")
unemployment_for_women = pd.read_csv("Unemployment rate for women.csv")

## Missing Data

Test your dataset for missing data and handle it as needed. Make notes in the form of code comments as to your thought process.

In [5]:
# check for empty columns
employment_rate.info()

# the data doesn't contain any null fields after calling info()
# there are 41 rows and all columns have 41 non-null fields

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41 entries, 0 to 40
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Series ID  41 non-null     object 
 1   Year       41 non-null     int64  
 2   Period     41 non-null     object 
 3   Label      41 non-null     object 
 4   Value      41 non-null     float64
dtypes: float64(1), int64(1), object(3)
memory usage: 1.7+ KB


In [6]:
# check for empty columns
job_postings.info()

# the data doesn't contain any null fields after calling info()
# there are 854 rows and all the columns contain 854 non-null fields

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 854 entries, 0 to 853
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   DATE         854 non-null    object 
 1   IHLCHGNEWUS  854 non-null    float64
dtypes: float64(1), object(1)
memory usage: 13.5+ KB


In [10]:
# check for empty columns
mental_health.info()

# the data contains null fields after calling info() so we will need to clean this data
# there are 10404 rows, but not all columns are non-null
# 9   Value                   9914 non-null   float64
# 10  LowCI                   9914 non-null   float64
# 11  HighCI                  9914 non-null   float64
# 12  Confidence Interval     9914 non-null   object 
# 13  Quartile Range          6732 non-null   object 
# 14  Suppression Flag        22 non-null     float64



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10404 entries, 0 to 10403
Data columns (total 15 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Indicator               10404 non-null  object 
 1   Group                   10404 non-null  object 
 2   State                   10404 non-null  object 
 3   Subgroup                10404 non-null  object 
 4   Phase                   10404 non-null  object 
 5   Time Period             10404 non-null  int64  
 6   Time Period Label       10404 non-null  object 
 7   Time Period Start Date  10404 non-null  object 
 8   Time Period End Date    10404 non-null  object 
 9   Value                   9914 non-null   float64
 10  LowCI                   9914 non-null   float64
 11  HighCI                  9914 non-null   float64
 12  Confidence Interval     9914 non-null   object 
 13  Quartile Range          6732 non-null   object 
 14  Suppression Flag        22 non-null   

In [14]:
# for the Value column, I'm choosing to drop the null rows because 
# since we have a lot of data, it shouldn't affect the end result much, and
# filling them will the mean or median won't be possible because the values all
# don't belong to the same indicator

# for the other columns, they won't need cleaning, because they are unnecessary
mental_health.dropna(subset = ["Value"], inplace = True)

# check for empty fields in Value
mental_health.info()

# now, there are 9914 entries, and the Value column also has 9914 non-null fields.
# 9   Value                   9914 non-null   float64

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9914 entries, 0 to 10403
Data columns (total 15 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Indicator               9914 non-null   object 
 1   Group                   9914 non-null   object 
 2   State                   9914 non-null   object 
 3   Subgroup                9914 non-null   object 
 4   Phase                   9914 non-null   object 
 5   Time Period             9914 non-null   int64  
 6   Time Period Label       9914 non-null   object 
 7   Time Period Start Date  9914 non-null   object 
 8   Time Period End Date    9914 non-null   object 
 9   Value                   9914 non-null   float64
 10  LowCI                   9914 non-null   float64
 11  HighCI                  9914 non-null   float64
 12  Confidence Interval     9914 non-null   object 
 13  Quartile Range          6723 non-null   object 
 14  Suppression Flag        0 non-null     

In [12]:
# check for empty columns
unemployment_for_men.info()

# the data doesn't contain any null fields after calling info()
# there are 29 rows and all the columns contain 29 non-null fields

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29 entries, 0 to 28
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Series ID  29 non-null     object 
 1   Year       29 non-null     int64  
 2   Period     29 non-null     object 
 3   Label      29 non-null     object 
 4   Value      29 non-null     float64
dtypes: float64(1), int64(1), object(3)
memory usage: 1.3+ KB


In [13]:
# check for empty columns
unemployment_for_women.info()

# the data doesn't contain any null fields after calling info()
# there are 29 rows and all the columns contain 29 non-null fields

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29 entries, 0 to 28
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Series ID  29 non-null     object 
 1   Year       29 non-null     int64  
 2   Period     29 non-null     object 
 3   Label      29 non-null     object 
 4   Value      29 non-null     float64
dtypes: float64(1), int64(1), object(3)
memory usage: 1.3+ KB


## Irregular Data

Detect outliers in your dataset and handle them as needed. Use code comments to make notes about your thought process.

In [None]:
# we are going to be sampling any random 10 rows to check for irregularities

In [16]:
employment_rate.sample(10)

# it looks like the Label column is supposed to be a date, so we can proceed to format it correctly

Unnamed: 0,Series ID,Year,Period,Label,Value
28,LNS14000000,2021,M05,2021 May,5.8
9,LNS14000000,2019,M10,2019 Oct,3.6
36,LNS14000000,2022,M01,2022 Jan,4.0
34,LNS14000000,2021,M11,2021 Nov,4.2
15,LNS14000000,2020,M04,2020 Apr,14.7
0,LNS14000000,2019,M01,2019 Jan,4.0
4,LNS14000000,2019,M05,2019 May,3.6
29,LNS14000000,2021,M06,2021 Jun,5.9
8,LNS14000000,2019,M09,2019 Sep,3.5
39,LNS14000000,2022,M04,2022 Apr,3.6


In [18]:
employment_rate['Label'] = pd.to_datetime(employment_rate['Label'])

# check to see it's fixed
employment_rate.sample(10)

Unnamed: 0,Series ID,Year,Period,Label,Value
15,LNS14000000,2020,M04,2020-04-01,14.7
8,LNS14000000,2019,M09,2019-09-01,3.5
14,LNS14000000,2020,M03,2020-03-01,4.4
4,LNS14000000,2019,M05,2019-05-01,3.6
21,LNS14000000,2020,M10,2020-10-01,6.9
31,LNS14000000,2021,M08,2021-08-01,5.2
22,LNS14000000,2020,M11,2020-11-01,6.7
27,LNS14000000,2021,M04,2021-04-01,6.0
3,LNS14000000,2019,M04,2019-04-01,3.6
30,LNS14000000,2021,M07,2021-07-01,5.4


In [19]:
job_postings.sample(10)

# no irregularities found in the data

Unnamed: 0,DATE,IHLCHGNEWUS
189,2020-08-08,-12.7
153,2020-07-03,-13.6
74,2020-04-15,-51.4
397,2021-03-04,31.4
739,2022-02-09,90.3
784,2022-03-26,71.2
119,2020-05-30,-39.2
535,2021-07-20,64.8
61,2020-04-02,-51.0
771,2022-03-13,76.2


In [20]:
mental_health.sample(10)

# it looks like the "Time Period Start Date" and "Time Period End Date" can be formatted better

Unnamed: 0,Indicator,Group,State,Subgroup,Phase,Time Period,Time Period Label,Time Period Start Date,Time Period End Date,Value,LowCI,HighCI,Confidence Interval,Quartile Range,Suppression Flag
385,"Received Counseling or Therapy, Last 4 Weeks",By State,California,California,2,14,"Sep 2 - Sep 14, 2020",09/02/2020,09/14/2020,8.9,8.0,9.9,8.0 - 9.9,7.8-8.9,
5102,Took Prescription Medication for Mental Health...,By Disability status,United States,Without disability,3.1,30,"May 12 - May 24, 2021",05/12/2021,05/24/2021,17.9,17.3,18.4,17.3 - 18.4,,
9594,Took Prescription Medication for Mental Health...,By Race/Hispanic ethnicity,United States,"Non-Hispanic White, single race",3.4,43,"Mar 2 - Mar 14, 2022",03/02/2022,03/14/2022,30.1,29.3,30.8,29.3 - 30.8,,
4264,Took Prescription Medication for Mental Health...,By Age,United States,50 - 59 years,3 (Jan 6 � Mar 29),27,"Mar 17 - Mar 29, 2021",03/17/2021,03/29/2021,24.5,23.2,25.8,23.2 - 25.8,,
7622,Needed Counseling or Therapy But Did Not Get I...,By State,Oregon,Oregon,3.2,37,"Sep 1 - Sep 13, 2021",09/01/2021,09/13/2021,15.8,13.5,18.5,13.5 - 18.5,12.6-17.7,
5280,Took Prescription Medication for Mental Health...,By State,New Hampshire,New Hampshire,3.1,30,"May 12 - May 24, 2021",05/12/2021,05/24/2021,24.0,20.4,27.9,20.4 - 27.9,22.6-25.1,
2864,Took Prescription Medication for Mental Health...,By State,Utah,Utah,3 (Jan 6 � Mar 29),22,"Jan 6 - Jan 18, 2021",01/06/2021,01/18/2021,29.1,26.2,32.1,26.2 - 32.1,27.5-32.1,
4732,Needed Counseling or Therapy But Did Not Get I...,By Disability status,United States,Without disability,3.1,28,"Apr 14 - Apr 26, 2021",04/14/2021,04/26/2021,7.8,7.3,8.3,7.3 - 8.3,,
7727,"Received Counseling or Therapy, Last 4 Weeks",By Gender identity,United States,Cis-gender female,3.2,38,"Sep 15 - Sep 27, 2021",09/15/2021,09/27/2021,12.0,11.4,12.6,11.4 - 12.6,,
8368,Took Prescription Medication for Mental Health...,By State,South Carolina,South Carolina,3.3,40,"Dec 1 - Dec 13, 2021",12/01/2021,12/13/2021,24.6,20.4,29.2,20.4 - 29.2,23.6-27.0,


In [21]:
tm_start = "Time Period Start Date"
tm_end = "Time Period End Date"
mental_health[tm_start] = pd.to_datetime(mental_health[tm_start])
mental_health[tm_end] = pd.to_datetime(mental_health[tm_end])

# check to see it's fixed
mental_health.sample(10)

Unnamed: 0,Indicator,Group,State,Subgroup,Phase,Time Period,Time Period Label,Time Period Start Date,Time Period End Date,Value,LowCI,HighCI,Confidence Interval,Quartile Range,Suppression Flag
8570,Took Prescription Medication for Mental Health...,By State,Oklahoma,Oklahoma,3.3,40,"Dec 1 - Dec 13, 2021",2021-12-01,2021-12-13,31.8,27.6,36.2,27.6 - 36.2,30.4-36.5,
5695,Took Prescription Medication for Mental Health...,By State,Alabama,Alabama,3.1,32,"Jun 9 - Jun 21, 2021",2021-06-09,2021-06-21,22.2,18.1,26.7,18.1 - 26.7,22.2-24.9,
9648,Took Prescription Medication for Mental Health...,By State,Utah,Utah,3.4,43,"Mar 2 - Mar 14, 2022",2022-03-02,2022-03-14,31.7,28.0,35.6,28.0 - 35.6,29.7-38.0,
7580,Needed Counseling or Therapy But Did Not Get I...,By Education,United States,High school diploma or GED,3.2,37,"Sep 1 - Sep 13, 2021",2021-09-01,2021-09-13,9.2,8.4,10.1,8.4 - 10.1,,
993,"Received Counseling or Therapy, Last 4 Weeks",By State,Oklahoma,Oklahoma,2.0,16,"Sep 30 - Oct 12, 2020",2020-09-30,2020-10-12,7.8,6.0,10.0,6.0 - 10.0,7.7-8.8,
1431,Needed Counseling or Therapy But Did Not Get I...,By State,Tennessee,Tennessee,2.0,17,"Oct 14 - Oct 26, 2020",2020-10-14,2020-10-26,9.9,8.1,12.0,8.1 - 12.0,9.4-10.0,
8763,Took Prescription Medication for Mental Health...,By State,Virginia,Virginia,3.3,41,"Dec 29, 2021 - Jan 10, 2022",2021-12-29,2022-01-10,23.2,20.2,26.4,20.2 - 26.4,20.5-24.2,
10149,Took Prescription Medication for Mental Health...,By State,Oklahoma,Oklahoma,3.4,45,"Apr 27 - May 9, 2022",2022-04-27,2022-05-09,27.3,22.6,32.3,22.6 - 32.3,25.3-27.5,
587,Took Prescription Medication for Mental Health...,By Presence of Symptoms of Anxiety/Depression,United States,Experienced symptoms of anxiety/depression in ...,2.0,15,"Sep 16 - Sep 28, 2020",2020-09-16,2020-09-28,33.9,32.9,34.9,32.9 - 34.9,,
4534,"Received Counseling or Therapy, Last 4 Weeks",By Race/Hispanic ethnicity,United States,"Non-Hispanic Black, single race",3.1,28,"Apr 14 - Apr 26, 2021",2021-04-14,2021-04-26,8.3,7.3,9.5,7.3 - 9.5,,


In [22]:
unemployment_for_men.sample(10)

# Label (which is a date) can be formatted better

Unnamed: 0,Series ID,Year,Period,Label,Value
23,LNS14000025,2021,M12,2021 Dec,3.6
21,LNS14000025,2021,M10,2021 Oct,4.3
12,LNS14000025,2021,M01,2021 Jan,6.1
27,LNS14000025,2022,M04,2022 Apr,3.5
9,LNS14000025,2020,M10,2020 Oct,6.7
0,LNS14000025,2020,M01,2020 Jan,3.2
22,LNS14000025,2021,M11,2021 Nov,3.9
16,LNS14000025,2021,M05,2021 May,5.8
7,LNS14000025,2020,M08,2020 Aug,7.9
6,LNS14000025,2020,M07,2020 Jul,9.3


In [24]:
unemployment_for_men['Label'] = pd.to_datetime(unemployment_for_men['Label'])

# check to see it's fixed
unemployment_for_men.sample(10)

Unnamed: 0,Series ID,Year,Period,Label,Value
9,LNS14000025,2020,M10,2020-10-01,6.7
28,LNS14000025,2022,M05,2022-05-01,3.4
14,LNS14000025,2021,M03,2021-03-01,5.8
27,LNS14000025,2022,M04,2022-04-01,3.5
10,LNS14000025,2020,M11,2020-11-01,6.7
1,LNS14000025,2020,M02,2020-02-01,3.2
6,LNS14000025,2020,M07,2020-07-01,9.3
23,LNS14000025,2021,M12,2021-12-01,3.6
24,LNS14000025,2022,M01,2022-01-01,3.8
7,LNS14000025,2020,M08,2020-08-01,7.9


In [23]:
unemployment_for_women.sample(10)

# Label (which is a date) can be formatted better

Unnamed: 0,Series ID,Year,Period,Label,Value
19,LNS14000026,2021,M08,2021 Aug,4.8
18,LNS14000026,2021,M07,2021 Jul,5.0
20,LNS14000026,2021,M09,2021 Sep,4.3
4,LNS14000026,2020,M05,2020 May,13.8
12,LNS14000026,2021,M01,2021 Jan,6.0
6,LNS14000026,2020,M07,2020 Jul,10.4
13,LNS14000026,2021,M02,2021 Feb,5.9
22,LNS14000026,2021,M11,2021 Nov,3.9
25,LNS14000026,2022,M02,2022 Feb,3.6
3,LNS14000026,2020,M04,2020 Apr,15.4


In [25]:
unemployment_for_women['Label'] = pd.to_datetime(unemployment_for_women['Label'])

# check to see it's fixed
unemployment_for_women.sample(10)

Unnamed: 0,Series ID,Year,Period,Label,Value
14,LNS14000026,2021,M03,2021-03-01,5.7
15,LNS14000026,2021,M04,2021-04-01,5.6
19,LNS14000026,2021,M08,2021-08-01,4.8
23,LNS14000026,2021,M12,2021-12-01,3.6
24,LNS14000026,2022,M01,2022-01-01,3.6
1,LNS14000026,2020,M02,2020-02-01,3.1
8,LNS14000026,2020,M09,2020-09-01,7.8
26,LNS14000026,2022,M03,2022-03-01,3.3
0,LNS14000026,2020,M01,2020-01-01,3.2
10,LNS14000026,2020,M11,2020-11-01,6.2


## Unnecessary Data

Look for the different types of unnecessary data in your dataset and address it as needed. Make sure to use code comments to illustrate your thought process.

In [None]:
# from running info() and smapling the data frames earlier, we can tell that some of the columns are redundant
# so we'll be removing them in this section. we'll also be checking for duplicated data.

In [32]:
# no uneccessary columns in data
# check for duplicates
employment_rate.duplicated()

# no duplicates found

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
30    False
31    False
32    False
33    False
34    False
35    False
36    False
37    False
38    False
39    False
40    False
dtype: bool

In [None]:
# no uneccessary columns in data
# check for duplicates
job_postings.duplicated()

# no duplicates found

In [34]:
# drop uneccessary columns
mental_health.drop(
    columns = [
        "Phase",
        "Time Period",
        "Time Period Label",
        "LowCI",
        "HighCI",
        "Confidence Interval",
        "Quartile Range",
        "Suppression Flag"
    ],
    inplace = True
)

# check for duplicates
mental_health.duplicated()

# no duplicates found

0        False
1        False
2        False
3        False
4        False
         ...  
10399    False
10400    False
10401    False
10402    False
10403    False
Length: 9914, dtype: bool

In [36]:
# no uneccessary columns in data
# check for duplicates
unemployment_for_men.duplicated()

# no duplicates found

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
dtype: bool

In [37]:
# no uneccessary columns in data
# check for duplicates
unemployment_for_women.duplicated()

# no duplicates found

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
dtype: bool

## Inconsistent Data

Check for inconsistent data and address any that arises. As always, use code comments to illustrate your thought process.

In [None]:
# we'll also be sampling the data to check for inconsistency
# we'll also be renaming some columns to make them intuitive and consistent

In [43]:
employment_rate.rename(
    columns = {"Label": "Time Period Start Date", "Period": "Month"},
    inplace = True
)

employment_rate.sample(10)

# from the sample, none look inconsistent

Unnamed: 0,Series ID,Year,Month,Time Period Start Date,Value
5,LNS14000000,2019,M06,2019-06-01,3.6
3,LNS14000000,2019,M04,2019-04-01,3.6
13,LNS14000000,2020,M02,2020-02-01,3.5
11,LNS14000000,2019,M12,2019-12-01,3.6
40,LNS14000000,2022,M05,2022-05-01,3.6
17,LNS14000000,2020,M06,2020-06-01,11.0
26,LNS14000000,2021,M03,2021-03-01,6.0
10,LNS14000000,2019,M11,2019-11-01,3.6
6,LNS14000000,2019,M07,2019-07-01,3.7
15,LNS14000000,2020,M04,2020-04-01,14.7


In [45]:
# rename columns
job_postings.rename(
    columns = {"IHLCHGNEWUS": "Relative Freq"},
    inplace = True
)

job_postings.sample(10)

# from the sample, none look inconsistent

Unnamed: 0,DATE,Relative Freq
171,2020-07-21,-12.8
71,2020-04-12,-52.3
841,2022-05-22,75.2
728,2022-01-29,71.0
351,2021-01-17,22.7
457,2021-05-03,56.5
91,2020-05-02,-47.2
52,2020-03-24,-35.2
470,2021-05-16,58.9
420,2021-03-27,46.4


In [35]:
mental_health.sample(10)

# from the sample, none look inconsistent

Unnamed: 0,Indicator,Group,State,Subgroup,Time Period Start Date,Time Period End Date,Value
10309,Took Prescription Medication for Mental Health...,By State,Oklahoma,Oklahoma,2022-04-27,2022-05-09,29.5
7525,Took Prescription Medication for Mental Health...,By State,Maryland,Maryland,2021-09-01,2021-09-13,24.8
8066,"Received Counseling or Therapy, Last 4 Weeks",By State,Alaska,Alaska,2021-09-29,2021-10-11,12.6
63,Took Prescription Medication for Mental Health...,By State,Tennessee,Tennessee,2020-08-19,2020-08-31,21.2
6095,"Received Counseling or Therapy, Last 4 Weeks",By State,New Jersey,New Jersey,2021-06-23,2021-07-05,9.6
5799,"Received Counseling or Therapy, Last 4 Weeks",By State,New Jersey,New Jersey,2021-06-09,2021-06-21,11.1
4033,Took Prescription Medication for Mental Health...,By State,South Carolina,South Carolina,2021-03-03,2021-03-15,26.0
5780,"Received Counseling or Therapy, Last 4 Weeks",By State,Hawaii,Hawaii,2021-06-09,2021-06-21,8.6
3293,Took Prescription Medication for Mental Health...,By State,Maryland,Maryland,2021-02-03,2021-02-15,23.2
6024,Took Prescription Medication for Mental Health...,By State,North Carolina,North Carolina,2021-06-23,2021-07-05,28.1


In [44]:
# rename columns
employment_rate.rename(
    columns = {"Label": "Time Period Start Date", "Period": "Month"},
    inplace = True
)

unemployment_for_men.sample(10)

# from the sample, none look inconsistent

Unnamed: 0,Series ID,Year,Label,Value
16,LNS14000025,2021,2021-05-01,5.8
17,LNS14000025,2021,2021-06-01,5.9
6,LNS14000025,2020,2020-07-01,9.3
25,LNS14000025,2022,2022-02-01,3.5
18,LNS14000025,2021,2021-07-01,5.3
9,LNS14000025,2020,2020-10-01,6.7
23,LNS14000025,2021,2021-12-01,3.6
24,LNS14000025,2022,2022-01-01,3.8
13,LNS14000025,2021,2021-02-01,6.0
14,LNS14000025,2021,2021-03-01,5.8


In [40]:
# rename columns
employment_rate.rename(
    columns = {"Label": "Time Period Start Date", "Period": "Month"},
    inplace = True
)

unemployment_for_women.sample(10)

# from the sample, none look inconsistent

Unnamed: 0,Series ID,Year,Label,Value
1,LNS14000026,2020,2020-02-01,3.1
7,LNS14000026,2020,2020-08-01,8.3
13,LNS14000026,2021,2021-02-01,5.9
27,LNS14000026,2022,2022-04-01,3.2
3,LNS14000026,2020,2020-04-01,15.4
19,LNS14000026,2021,2021-08-01,4.8
20,LNS14000026,2021,2021-09-01,4.3
10,LNS14000026,2020,2020-11-01,6.2
15,LNS14000026,2021,2021-04-01,5.6
9,LNS14000026,2020,2020-10-01,6.5


## Summarize Your Results

Make note of your answers to the following questions.

1. Did you find all four types of dirty data in your dataset?
Yes I did.

2. Did the process of cleaning your data give you new insights into your dataset?
Yes. I saw how some columns connect to others and some a

3. Is there anything you would like to make note of when it comes to manipulating the data and making visualizations?
- Working with clean data makes much easier to connect dots and visualize
- Rename columns so that you and your readers can more easily understand the underlying data
- Commenting and documenting your work and thought process makes it easier to remember why you made certain decisions
- Cleaning data also makes you more familiar with the data you're working with