# Python: Errors management

Goals:

* Learn how to find the unique values in a dataset

* Manage missing values in a dataset

* Use the try/except block to avoid errors

* Extract a year from a date, convert it to an integer and add the year column in a dataset

## Sets 

The **set()** function returns a **set** that contains the **unique elements** of a list.

In [1]:
# Example
animals = ["Dog", "Tiger", "Cat", "Cat", "Dog", "Dog"]
unique_animals = set(animals)
print(unique_animals)

{'Dog', 'Cat', 'Tiger'}


To add elements to a set, we use the **add()** method.

In [2]:
# Example
unique_animals.add("Turtle")
print(unique_animals)

{'Dog', 'Cat', 'Tiger', 'Turtle'}


To remove an element, we use the **remove()** method.

In [3]:
# Example
unique_animals.remove("Cat")
print(unique_animals)

{'Dog', 'Tiger', 'Turtle'}


To convert a set into a list, we use the **list()** function.

In [4]:
unique_animals = list(unique_animals)
print(unique_animals)

['Dog', 'Tiger', 'Turtle']


### Training

In [5]:
import csv

f = open("legislators.csv")
legislators = list(csv.reader(f))

In [6]:
print(legislators[0:5])

[['last_name', 'first_name', 'birthday', 'gender', 'type', 'state', 'party'], ['Bassett', 'Richard', '1745-04-02', 'M', 'sen', 'DE', 'Anti-Administration'], ['Bland', 'Theodorick', '1742-03-21', '', 'rep', 'VA', ''], ['Burke', 'Aedanus', '1743-06-16', '', 'rep', 'SC', ''], ['Carroll', 'Daniel', '1730-07-22', 'M', 'rep', 'MD', '']]


In [7]:
gender = []

for item in legislators:
    gender.append(item[3])

In [8]:
print(gender[0:50])

['gender', 'M', '', '', 'M', 'M', 'M', '', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', '', 'M', 'M', 'M', 'M', 'M', 'M', 'M', '', 'M', '', 'M', 'M', 'M', 'M', 'M', '', 'M', 'M', 'M', 'M']


In [9]:
gender = set(gender)
print(gender)

{'M', '', 'gender', 'F'}


## Dataset exploration

Let's look at all the unique party in our dataset.

In [10]:
party = []

for item in legislators:
    party.append(item[6])

In [11]:
party = set(party)
print(party)

{'', 'Constitutional Unionist', 'American Labor', 'Readjuster', 'Whig', 'Crawford Republican', 'Unknown', 'Jacksonian', 'Populist', 'Socialist', 'Union Democrat', 'Prohibitionist', 'Coalitionist', 'Pro-Administration', 'Anti Masonic', 'National Greenbacker', 'Farmer-Labor', 'Adams Democrat', 'Adams', 'Conservative', 'Independent', 'Conservative Republican', 'Jackson Republican', 'Progressive Republican', 'Anti Jacksonian', 'Liberty', 'Anti-Lecompton Democrat', 'Silver Republican', 'Ind. Republican-Democrat', 'Anti Jackson', 'States Rights', 'Republican', 'Unconditional Unionist', 'Unionist', 'American', 'Democrat-Liberal', 'Anti-Administration', 'Ind. Whig', 'Anti-Jacksonian', 'Nonpartisan', 'party', 'Progressive', 'Nullifier', 'Union', 'Liberal Republican', 'Jackson', 'Free Soil', 'Law and Order', 'Union Labor', 'Republican-Conservative', 'Free Silver', 'Ind. Republican', 'Democratic and Union Labor', 'New Progressive', 'Federalist', 'Readjuster Democrat', 'Ind. Democrat', 'Democrat',

## Missing values

Let's replace all missing values in the party column with "No Party" label.

In [12]:
for row in legislators:
    if row[6] == '':
        row[6] = "No Party"

In [13]:
party = []

for item in legislators:
    party.append(item[6])

In [14]:
party = set(party)
print(party)

{'Constitutional Unionist', 'American Labor', 'Readjuster', 'Whig', 'Crawford Republican', 'Unknown', 'Jacksonian', 'Populist', 'Socialist', 'Union Democrat', 'Prohibitionist', 'No Party', 'Coalitionist', 'Pro-Administration', 'Anti Masonic', 'National Greenbacker', 'Farmer-Labor', 'Adams Democrat', 'Adams', 'Conservative', 'Independent', 'Conservative Republican', 'Jackson Republican', 'Progressive Republican', 'Anti Jacksonian', 'Liberty', 'Anti-Lecompton Democrat', 'Silver Republican', 'Ind. Republican-Democrat', 'Anti Jackson', 'States Rights', 'Republican', 'Unconditional Unionist', 'Unionist', 'American', 'Democrat-Liberal', 'Anti-Administration', 'Ind. Whig', 'Anti-Jacksonian', 'Nonpartisan', 'party', 'Progressive', 'Nullifier', 'Union', 'Liberal Republican', 'Jackson', 'Free Soil', 'Law and Order', 'Union Labor', 'Republican-Conservative', 'Free Silver', 'Ind. Republican', 'Democratic and Union Labor', 'New Progressive', 'Federalist', 'Readjuster Democrat', 'Ind. Democrat', 'De

### Training

Replace all missing values in the gender column with sex "M".

In [15]:
for row in legislators:
    if row[3] == '':
        row[3] = "M"

In [16]:
gender = []

for item in legislators:
    gender.append(item[3])

In [17]:
gender = set(gender)
print(gender)

{'M', 'F', 'gender'}


## Analysis of the years of birth

The **split()** method is widely used to **extract information** about dates as shown in the following example.

In [18]:
# Example
date = "2022-01-04"
date_parts = date.split('-')
date_parts

['2022', '01', '04']

In [19]:
year = date_parts[0]
year

'2022'

In [20]:
month = date_parts[1]
month

'01'

In [21]:
day = date_parts[2]
day

'04'

### Training

In [22]:
birth_years = []

for row in legislators:
    date = row[2]
    date_parts = date.split('-')
    birth_years.append(date_parts[0])

In [23]:
print(birth_years[0:10])

['birthday', '1745', '1742', '1743', '1730', '1739', '', '1738', '1745', '1748']


## Try / Except block

In [24]:
# Motivation
int('')

ValueError: invalid literal for int() with base 10: ''

The **try/except** block allows to continue the execution even if there is an error.

In [25]:
# Example
try:
    int('')
except:
    print("Impossible to convert!")

Impossible to convert!


Let's take a closer look at what is in the exception class.

In [26]:
try:
    int('')
except Exception as e:
    print(type(e))
    print(str(e))

<class 'ValueError'>
invalid literal for int() with base 10: ''


## The keyword pass

In [27]:
# Example
numbers = [1,2,3,4,5,6,7,8,9,10]

for i in numbers:
    try:
        int('')
    except Exception:
        print("There is an error!")

There is an error!
There is an error!
There is an error!
There is an error!
There is an error!
There is an error!
There is an error!
There is an error!
There is an error!
There is an error!


In the try/except context, the **pass** keyword is used if you do not want to execute an action in case of an error.

In [28]:
numbers = [1,2,3,4,5,6,7,8,9,10]

for i in numbers:
    try:
        int('')
    except Exception:
        pass

### Training

In [29]:
int_years = []

for year in birth_years:
    try:
        year = int(year)
    except Exception:
        pass
    int_years.append(year)

In [30]:
print(int_years[0:42])

['birthday', 1745, 1742, 1743, 1730, 1739, '', 1738, 1745, 1748, 1734, 1756, '', 1737, 1754, 1736, '', 1727, 1733, 1732, 1737, 1739, 1734, 1740, 1745, 1728, '', 1738, 1737, 1739, 1744, '', 1761, 1756, 1752, 1737, 1745, 1744, 1742, 1726, '', 1733]


## Convert the year of birth into integer in the dataset

### Training

In [31]:
for row in legislators:
    
    birthday = row[2]
    birth_year = birthday.split('-')[0]
    
    try:
        birth_year = int(birth_year)
    except Exception:
        birth_year = 0
    
    row.append(birth_year)

In [32]:
legislators[0][7] = "birth_year"
print(legislators[0:10])

[['last_name', 'first_name', 'birthday', 'gender', 'type', 'state', 'party', 'birth_year'], ['Bassett', 'Richard', '1745-04-02', 'M', 'sen', 'DE', 'Anti-Administration', 1745], ['Bland', 'Theodorick', '1742-03-21', 'M', 'rep', 'VA', 'No Party', 1742], ['Burke', 'Aedanus', '1743-06-16', 'M', 'rep', 'SC', 'No Party', 1743], ['Carroll', 'Daniel', '1730-07-22', 'M', 'rep', 'MD', 'No Party', 1730], ['Clymer', 'George', '1739-03-16', 'M', 'rep', 'PA', 'No Party', 1739], ['Contee', 'Benjamin', '', 'M', 'rep', 'MD', 'No Party', 0], ['Dalton', 'Tristram', '1738-05-28', 'M', 'sen', 'MA', 'Pro-Administration', 1738], ['Elmer', 'Jonathan', '1745-11-29', 'M', 'sen', 'NJ', 'Pro-Administration', 1745], ['Few', 'William', '1748-06-08', 'M', 'sen', 'GA', 'Anti-Administration', 1748]]


## Modify the values of the missing years

### Training

In [33]:
last_value = 1

for row in legislators:
    if row[7] == 0:
        row[7] = last_value
    last_value = row[7]

In [34]:
print(legislators[0:10])

[['last_name', 'first_name', 'birthday', 'gender', 'type', 'state', 'party', 'birth_year'], ['Bassett', 'Richard', '1745-04-02', 'M', 'sen', 'DE', 'Anti-Administration', 1745], ['Bland', 'Theodorick', '1742-03-21', 'M', 'rep', 'VA', 'No Party', 1742], ['Burke', 'Aedanus', '1743-06-16', 'M', 'rep', 'SC', 'No Party', 1743], ['Carroll', 'Daniel', '1730-07-22', 'M', 'rep', 'MD', 'No Party', 1730], ['Clymer', 'George', '1739-03-16', 'M', 'rep', 'PA', 'No Party', 1739], ['Contee', 'Benjamin', '', 'M', 'rep', 'MD', 'No Party', 1739], ['Dalton', 'Tristram', '1738-05-28', 'M', 'sen', 'MA', 'Pro-Administration', 1738], ['Elmer', 'Jonathan', '1745-11-29', 'M', 'sen', 'NJ', 'Pro-Administration', 1745], ['Few', 'William', '1748-06-08', 'M', 'sen', 'GA', 'Anti-Administration', 1748]]
