<a href="https://colab.research.google.com/github/carlosdgerez/machine_learning/blob/main/competence/graphsIdaho.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Import libraries

In [1]:
import pandas as pd   
import altair as alt   
import numpy as np

#  Import data from github

In [2]:
url = "https://github.com/carlosdgerez/machine_learning/raw/main/competence/Children%20whose%20parents%20lack%20secure%20employment.csv"
df = pd.read_csv(url)

In [3]:
df

Unnamed: 0,LocationType,Location,TimeFrame,DataFormat,Data
0,State,Idaho,2008,Number,108000.0
1,State,Idaho,2008,Percent,0.26
2,State,Idaho,2009,Number,120000.0
3,State,Idaho,2009,Percent,0.29
4,State,Idaho,2010,Number,134000.0
5,State,Idaho,2010,Percent,0.31
6,State,Idaho,2011,Number,132000.0
7,State,Idaho,2011,Percent,0.31
8,State,Idaho,2012,Number,120000.0
9,State,Idaho,2012,Percent,0.28


# Start to prepare data for graphs.

In [4]:
df2 = df.pivot_table(index = ["LocationType","Location", "TimeFrame"],
               columns = "DataFormat",
               values ="Data").reset_index()

In [5]:
df2

DataFormat,LocationType,Location,TimeFrame,Number,Percent
0,State,Idaho,2008,108000.0,0.26
1,State,Idaho,2009,120000.0,0.29
2,State,Idaho,2010,134000.0,0.31
3,State,Idaho,2011,132000.0,0.31
4,State,Idaho,2012,120000.0,0.28
5,State,Idaho,2013,109000.0,0.26
6,State,Idaho,2014,104000.0,0.24
7,State,Idaho,2015,109000.0,0.25
8,State,Idaho,2016,104000.0,0.24
9,State,Idaho,2017,105000.0,0.24


# Look into datatypes and fix the date variables type.

In [6]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   LocationType  13 non-null     object 
 1   Location      13 non-null     object 
 2   TimeFrame     13 non-null     int64  
 3   Number        13 non-null     float64
 4   Percent       13 non-null     float64
dtypes: float64(2), int64(1), object(2)
memory usage: 648.0+ bytes


In [7]:
df2['TimeFrame'] = pd.to_datetime(df2['TimeFrame'].astype(str), format='%Y')

# Make the graph in altair.

In [8]:
title = "Children whose parents lack secure employment"
subtitle = "for Idaho"

In [9]:
chart = alt.Chart(df2).mark_bar().encode(
    x = alt.X("TimeFrame:T", axis=alt.Axis(labelAngle=45)),
    y = alt.Y("Number"))\
    .properties(title={
    'text': title,
    'subtitle': subtitle
})

# It is missing data from 2020 probably due to covid.

In [10]:
chart

In [11]:
chart2 = alt.Chart(df2).mark_bar(color = "green").encode(
    x = alt.X("TimeFrame:T", axis=alt.Axis(labelAngle=45)),
    y = alt.Y("Percent"))\
    .properties(title={
    'text': title + " by Percent",
    'subtitle': subtitle
})

In [12]:
chart2  | chart

# Download new data from the same source
I want to find if is a trend in unemployment rates across states.

In [13]:
url = "https://raw.githubusercontent.com/carlosdgerez/machine_learning/main/competence/Unemployment%20rate%20of%20parents(1).csv"
dfstates = pd.read_csv(url)

In [14]:
dfstates

Unnamed: 0,LocationType,Location,TimeFrame,DataFormat,Data
0,Nation,United States,2007,Number,2109000.00
1,Nation,United States,2007,Percent,0.04
2,Nation,United States,2008,Percent,0.05
3,Nation,United States,2008,Number,2647000.00
4,Nation,United States,2013,Number,3184000.00
...,...,...,...,...,...
1555,State,Wyoming,2013,Number,3000.00
1556,State,Wyoming,2008,Percent,0.02
1557,State,Wyoming,2008,Number,2000.00
1558,State,Wyoming,2007,Number,2000.00


Since all data from this source is formated equal I create a function to process the data to get the graphs an prepare for ml.

In [15]:
def prepareData(data):
  data= data.pivot_table(index = ["LocationType","Location", "TimeFrame"],
               columns = "DataFormat",
               values ="Data").reset_index()
  data['TimeFrame'] = pd.to_datetime(data['TimeFrame'].astype(str), format='%Y').dt.year

  return data


In [16]:
dfstates = prepareData(dfstates)

In [17]:
dfstates.columns


Index(['LocationType', 'Location', 'TimeFrame', 'Number', 'Percent'], dtype='object', name='DataFormat')

In [18]:
dfstatesFilter = dfstates.loc[dfstates.LocationType != "Nation"]

In [19]:
dfstatesFilter

DataFormat,LocationType,Location,TimeFrame,Number,Percent
0,City,District of Columbia,2007,5000.0,0.08
1,City,District of Columbia,2008,5000.0,0.09
2,City,District of Columbia,2009,8000.0,0.12
3,City,District of Columbia,2010,8000.0,0.11
4,City,District of Columbia,2011,8000.0,0.12
...,...,...,...,...,...
775,State,Wyoming,2017,3000.0,0.03
776,State,Wyoming,2018,3000.0,0.03
777,State,Wyoming,2019,2000.0,0.02
778,State,Wyoming,2020,4000.0,0.05


In [20]:
chart4 = alt.Chart(dfstatesFilter).mark_line().encode(
    x = alt.X("TimeFrame:T", axis=alt.Axis(labelAngle=45)),
    y = alt.Y("Number"),
    color = "Location",
    tooltip = ["LocationType","Location","TimeFrame","Number", "Percent"])\
    .properties(title={
    'text': title,
    'subtitle': subtitle
}).interactive()

In [21]:
chart4