# Data Cleaning | Democracy Index

#### This file is the cleaning of the Democracy_index file.
#### The file was copied from https://docs.google.com/spreadsheets/d/1d0noZrwAWxNBTDSfDgG06_aLGWUz4R6fgDhRaUZbDzE/edit#gid=176703676 on 29/11/21 at 08:51AM. It was pasted into excel and saved as a CSV. The data is produced by the Economist, more information found here: https://www.eiu.com/topic/democracy-index/ 

## Import Libraries

In [53]:
import pandas as pd
import numpy as np

## Import CSV

In [54]:
democracy_data = pd.read_csv('democracy_index.csv', sep=";")

In [55]:
# View the df.
democracy_data

Unnamed: 0,geo,name,time,Democracy index (EIU),Electoral pluralism index (EIU),Government index (EIU),Political participation index(EIU),Political culture index (EIU),Civil liberties index (EIU),Change in democracy index (EIU)
0,afg,Afghanistan,2006.0,30.6,61.70,0.00,22.2,25.0,44.10,
1,afg,Afghanistan,2007.0,30.4,56.70,3.95,22.2,25.0,44.10,-0.2
2,afg,Afghanistan,2008.0,30.2,51.70,7.90,22.2,25.0,44.10,-0.2
3,afg,Afghanistan,2009.0,27.5,38.35,7.90,25.0,25.0,41.15,-2.7
4,afg,Afghanistan,2010.0,24.8,25.00,7.90,27.8,25.0,38.20,-2.7
...,...,...,...,...,...,...,...,...,...,...
4195,,,,,,,,,,
4196,,,,,,,,,,
4197,,,,,,,,,,
4198,,,,,,,,,,


### What do we notice about the dataframe so far?

1. There are a lot of unnecessary rows that are null.
2. Some of the rows with data, don't necessarily always have data. 
3. The year doesn't need a decimal point.
4. We don't need all these attributes.

### Nulls

In [56]:
# Dropping rows where the whole row is empty.
democracy_data = democracy_data.dropna(how='all')

In [59]:
democracy_data

Unnamed: 0,geo,name,time,Democracy index (EIU),Electoral pluralism index (EIU),Government index (EIU),Political participation index(EIU),Political culture index (EIU),Civil liberties index (EIU),Change in democracy index (EIU)
0,afg,Afghanistan,2006-01-01,30.6,61.70,0.00,22.2,25.0,44.10,
1,afg,Afghanistan,2007-01-01,30.4,56.70,3.95,22.2,25.0,44.10,-0.2
2,afg,Afghanistan,2008-01-01,30.2,51.70,7.90,22.2,25.0,44.10,-0.2
3,afg,Afghanistan,2009-01-01,27.5,38.35,7.90,25.0,25.0,41.15,-2.7
4,afg,Afghanistan,2010-01-01,24.8,25.00,7.90,27.8,25.0,38.20,-2.7
...,...,...,...,...,...,...,...,...,...,...
2496,zwe,Zimbabwe,2016-01-01,30.5,5.00,20.00,38.9,56.3,32.40,
2497,zwe,Zimbabwe,2017-01-01,31.6,5.00,20.00,44.4,56.3,32.40,
2498,zwe,Zimbabwe,2018-01-01,31.6,5.00,20.00,44.4,56.3,32.40,
2499,zwe,Zimbabwe,2019-01-01,31.6,0.00,25.00,44.4,56.3,32.40,


### Time

In [60]:
democracy_data['time'] = pd.to_datetime(democracy_data['time'], format='%Y')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  democracy_data['time'] = pd.to_datetime(democracy_data['time'], format='%Y')


In [61]:
democracy_data

Unnamed: 0,geo,name,time,Democracy index (EIU),Electoral pluralism index (EIU),Government index (EIU),Political participation index(EIU),Political culture index (EIU),Civil liberties index (EIU),Change in democracy index (EIU)
0,afg,Afghanistan,2006-01-01,30.6,61.70,0.00,22.2,25.0,44.10,
1,afg,Afghanistan,2007-01-01,30.4,56.70,3.95,22.2,25.0,44.10,-0.2
2,afg,Afghanistan,2008-01-01,30.2,51.70,7.90,22.2,25.0,44.10,-0.2
3,afg,Afghanistan,2009-01-01,27.5,38.35,7.90,25.0,25.0,41.15,-2.7
4,afg,Afghanistan,2010-01-01,24.8,25.00,7.90,27.8,25.0,38.20,-2.7
...,...,...,...,...,...,...,...,...,...,...
2496,zwe,Zimbabwe,2016-01-01,30.5,5.00,20.00,38.9,56.3,32.40,
2497,zwe,Zimbabwe,2017-01-01,31.6,5.00,20.00,44.4,56.3,32.40,
2498,zwe,Zimbabwe,2018-01-01,31.6,5.00,20.00,44.4,56.3,32.40,
2499,zwe,Zimbabwe,2019-01-01,31.6,0.00,25.00,44.4,56.3,32.40,


### Unnecessary Attributes

Democracy index (EIU) is all that we need.

In [66]:
democracy_data = democracy_data.loc[:, :'Democracy index (EIU)']

In [70]:
democracy_data

Unnamed: 0,geo,name,time,Democracy index (EIU)
0,afg,Afghanistan,2006-01-01,30.6
1,afg,Afghanistan,2007-01-01,30.4
2,afg,Afghanistan,2008-01-01,30.2
3,afg,Afghanistan,2009-01-01,27.5
4,afg,Afghanistan,2010-01-01,24.8
...,...,...,...,...
2496,zwe,Zimbabwe,2016-01-01,30.5
2497,zwe,Zimbabwe,2017-01-01,31.6
2498,zwe,Zimbabwe,2018-01-01,31.6
2499,zwe,Zimbabwe,2019-01-01,31.6


In [69]:
# Checking # of nulls in Democracy Index.
democracy_data['Democracy index (EIU)'].isna().sum()

0