##**Unemployment**
Unemployment is a term referring to individuals who are employable and actively seeking a job but are unable to find a job. Included in this group are those people in the workforce who are working but do not have an appropriate job. Usually measured by the unemployment rate, which is dividing the number of unemployed people by the total number of people in the workforce, unemployment serves as one of the indicators of a country’s economic status.[Source](https://https://corporatefinanceinstitute.com/resources/knowledge/economics/unemployment/)

## **Unemployment in the United States**
Unemployment in the United States discusses the causes and measures of U.S. unemployment and strategies for reducing it. Job creation and unemployment are affected by factors such as economic conditions, global competition, education, automation, and demographics. These factors can affect the number of workers, the duration of unemployment, and wage levels. continue reading: [Wikipedia](https://en.wikipedia.org/wiki/Unemployment_in_the_United_States#:~:text=In%20September%202019%2C%20the%20U.S.,pandemic%20in%20the%20United%20States.)

# Import neccessary libraries, import data, set options.

In [None]:
#!pip install pandas_profiling --upgrade

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#from pandas_profiling import ProfileReport
%matplotlib inline
plt.style.use("ggplot")

In [None]:
#general_report = ProfileReport(data)
#general_report

In [None]:
data = pd.read_csv('../input/unemployment-by-county-us/output.csv')
data.head()

## Data Exploration

In [None]:
data.describe( include="all")

The description of the dataset

*   There are 885548 rows, 5 colums.
*   The year column represents the recorded years for the unemployment rate, the minumum year recorded is 1990 and the maximum year is 2016, this indicates the data span across 26years. 
*   The month columns contains 12 distinct values as expected there are 12months in a year. and march appears to be the most frequent month.
*  The State column consists of 47 distinct values, There are 50 states in the US, Hence, three States are not present in the data. Also Texas appears most in the data with 57658 frequency.
*   County in US mean an administrative or political subdivision of each state that consists of a geographic region with specific boundaries and usually some level of governmental authority, just like a local govwernment area, US has 3,144 counties, in the column we have 1752 distinct values. hence, not all counties are represented by the data.
*   Rate columns is the columns that indicates the unemployment rate in 47 states of the US. there is a min values of 0.00 indicating that there is a preriod in a particular states that records Zero unemployment rate and we have the max values at 58.4 which also indicates the is a peroid in a particular states that records over 50% unemployment rate.

*   Lastly, the count columns indicates there are no missing rows in the data, but let cross check.
















In [None]:
data.isnull().sum()

No null values!

# States Analysis

lets check for the missing states


In [None]:
state_=data.State.unique()
state_

From the data, the missing states are Alaska, Florida and Georgia, although Florida is the 3rd most populous state in US, Alaska has the largest area in the US and Georgia is the 7th most populous state in the US.

In [None]:
#Countplot on the states
plt.figure(figsize=(12,6))
g = sns.countplot(data['State'])
g.set_xticklabels(g.get_xticklabels(), rotation=90, ha="right")
plt.show()

The frequency of Texas is much more than the other states and the lowest frequency is Delaware.

Let's check for states and rate

In [None]:
#average unemployment rate per states
state_rate=data[['State', 'Rate']]
state_rate_= state_rate.groupby(['State'],as_index=False).mean()
state_rate_=state_rate_.sort_values(['Rate'], ascending=False)
state_rate_

In [None]:
fig, ax = plt.subplots(figsize=(14,6))
sns.barplot(x='State', y='Rate', data=state_rate_, ax=ax)
plt.title('The Average Rate Per states')
plt.xticks(rotation='vertical')
plt.show()

Some of the Top states with high unemployment rate here are rank on top 15 lists of the poorest states in US. [source](https://safety.com/the-poorest-states-in-america)


In [None]:
max_data=pd.DataFrame(data, columns=['Year', 'Month','State','County', 'Rate'])
max_rate=max_data[max_data['Rate']>50]
max_rate

 
*   The states with high unemployment rates are Texas and Colarodo between 1990-1991, 1992 respectively.
*  San Juan County in Colorado experienced the higest unemployment rate(58.4) in January, 1992.




In [None]:
min_data=pd.DataFrame(data, columns=['Year', 'Month','State','County', 'Rate'])
min_rate=max_data[min_data['Rate']<0.1]
min_rate

The state with zero unemployment rates is Texas between 1990-1993

Interestingly, in Texas, between 1990-1991, Starr County experienced high unemployment rate and Loving County and McMullen County experienced Zero umemployment rate. same state different districts. Note, The US was just recoverng from the Early 1990 recession which lasted eight months from July 1990 to March 1991.

# Rate Column Analysis

In [None]:
plt.figure(figsize=(12,5))
sns.distplot(data['Rate'])
plt.show()

The rate column is positively skewed. that is, positive skewness means when the tail on the right side of the distribution is longer or fatter. The mean and median will be greater than the mode

---

---




 *Skewness is the degree of distortion from the symmetrical bell curve or the normal distribution. It measures the lack of symmetry in data distribution. A symmetrical distribution will have a skewness of 0.*

In [None]:
print(f'MODE:', data.Rate.mode()),
print('-'*10)
print(f'median:', data.Rate.median()),
print('-'*10)
print(f'mean:', data.Rate.mean()),
#data.Rate.()

Just confirmed the Rate is positively skewed. the mean and the median is greater than the mode.

# Data Exploration by year


In [None]:
#average unemployment rate per year
year=data[['Year', 'Rate']]
year_= year.groupby(['Year'],as_index=False).mean()
year_=year_.sort_values(['Year'], ascending=False)
year_

In [None]:
#check the trend
fig, ax = plt.subplots(figsize=(12,5))
sns.lineplot(x='Year', y='Rate', data=year_, ci=None, markers=True, ax=ax)
ax.set_xticks(ticks=data['Year'].value_counts(ascending=True).index)
plt.xticks(rotation='vertical')
plt.show()

In [None]:
#check for maxumium rate in each year
year=data[['Year', 'Rate']]
year_= year.groupby(['Year'],as_index=False).max()
year_=year_.sort_values(['Year'], ascending=False)

fig, ax = plt.subplots(figsize=(12,5))
sns.lineplot(x='Year', y='Rate', data=year_, ci=None, markers=True, ax=ax)
ax.set_xticks(ticks=data['Year'].value_counts(ascending=True).index)
plt.xticks(rotation='vertical')
plt.show()

In [None]:
#check for minumium rate in each year
year=data[['Year', 'Rate']]
year_= year.groupby(['Year'],as_index=False).min()
year_=year_.sort_values(['Year'], ascending=False)

fig, ax = plt.subplots(figsize=(12,5))
sns.lineplot(x='Year', y='Rate', data=year_, ci=None, markers=True, ax=ax)
ax.set_xticks(ticks=data['Year'].value_counts(ascending=True).index)
plt.xticks(rotation='vertical')
plt.show()

# Data Exploration by Month

In [None]:
#check for maxumium rate in each month
Month=data[['Month', 'Rate']]
Month_= Month.groupby(['Month'],as_index=False).mean()
Month_=Month_.sort_values(['Month'], ascending=False)

plt.figure(figsize=(10,5))
g = sns.barplot(x='Month', y='Rate', data=Month_)
g.set_xticklabels(g.get_xticklabels(), rotation=90, ha="right")
plt.show()

In [None]:
county=data[['County', 'Rate']]
County_= county.groupby(['County'],as_index=False).mean()
County_=County_.sort_values(['Rate'], ascending=False)
County_