# Challenge 2
## Working with the NFL suspension dataset

### Import, read the file

In [49]:
import csv

file = csv.reader(open('nfl-suspensions-data.csv'))
nfl_suspensions = list(file)
# print(nfl_suspensions)
nfl_suspensions = nfl_suspensions[1:]
print(nfl_suspensions[0:])

[['F. Davis', 'WAS', 'Indef.', 'Substance abuse, repeated offense', 'Marijuana-related', '2014', 'http://www.cbssports.com/nfl/eye-on-football/24448694/redskins-te-fred-davis-suspended-Indefiniteinitely-by-nfl'], ['J. Blackmon', 'JAX', 'Indef.', 'Substance abuse, repeated offense', '', '2014', 'http://espn.go.com/nfl/story/_/id/11257934/justin-blackmon-jacksonville-jaguars-arrested-marijuana-possession'], ['L. Brazill', 'IND', 'Indef.', 'Substance abuse, repeated offense', '', '2014', 'http://www.nfl.com/news/story/0ap2000000364622/article/lavon-brazill-released-by-colts-in-wake-of-suspension'], ['T. Jackson', 'WAS', 'Indef.', 'Substance abuse, repeated offense', '', '2014', 'http://www.nfl.com/news/story/0ap2000000364087/article/tanard-jackson-suspended-Indefiniteinitely-by-nfl'], ['M. Hapes', 'NYG', 'Indef.', 'Personal conduct', 'Gambling-related', '1946', 'http://espn.go.com/blog/nflnation/tag/_/name/frank-filchock'], ['R. Rice', 'BAL', 'Indef.', 'Personal conduct', 'Domestic violen

In [8]:
years = {}

for row in nfl_suspensions:
    year = row[5]
    if year in years:
        years[year] +=1
    else:
        years[year] = 1
print(years)

{'2014': 29, '1946': 1, '1947': 1, '2010': 21, '2008': 10, '2007': 17, '1983': 1, '2009': 10, '2005': 8, '2000': 1, '2012': 45, '2001': 3, '2006': 11, '1989': 17, '   ': 1, '1963': 1, '2013': 40, '1990': 3, '2011': 13, '2004': 6, '2002': 7, '2003': 9, '1997': 3, '1999': 5, '1993': 1, '1995': 1, '1998': 2, '1994': 1, '1986': 1}


## Get unique values for Teams and Games

### Teams

In [18]:
teams = [row[1] for row in nfl_suspensions]

In [22]:
unique_teams = list(set(teams))
print(unique_teams)

['TB', 'PIT', 'DAL', 'CHI', 'PHI', 'CLE', 'ARI', 'OAK', 'KC', 'DET', 'NYJ', 'BAL', 'TEN', 'STL', 'IND', 'SEA', 'NYG', 'FREE', 'DEN', 'SD', 'NE', 'CIN', 'HOU', 'MIN', 'GB', 'SF', 'ATL', 'JAX', 'BUF', 'CAR', 'LA', 'NO', 'MIA', 'WAS']


### Games

In [25]:
games = [row[2] for row in nfl_suspensions]
unique_games = list(set(games))
print(unique_games)

['20', '8', '4', '2', '36', '32', '1', '6', '14', '16', '10', '5', 'Indef.', '3']


## Create class Suspension

In [81]:
class Suspension():
    def __init__(self,line):
        self.name = nfl_suspensions[line-1][0]
        self.team = nfl_suspensions[line-1][1]
        self.games = nfl_suspensions[line-1][2]
        self.year = nfl_suspensions[line-1][5]

third_line = Suspension(3)

# OR !

class Suspension():
    def __init__(self,line):
        self.name = line[0]
        self.team = line[1]
        self.games = line[2]
        self.year = line[5]

third_line = Suspension(nfl_suspensions[22])


In [82]:
print(third_line.name)
print(third_line.year)
print(third_line.team)
print(third_line.games)

P. Hornung
   
GB
14


## Improve this class


In [104]:
class Suspension():
    
    def __init__(self, line):
        self.name = line[0]
        self.team = line[1]
        self.games = line[2]
        if line[5] == '   ':
            self.year = 0    
        else:
            self.year = int(line[5])
            
        
    def get_year(self):
        return(self.year)

In [106]:
# third_line = Suspension(line =nfl_suspensions[2])
missing_year = Suspension(nfl_suspensions[21])
print(missing_year.year)

# used to find a missing year
count = 0
for row in nfl_suspensions:
    if row[0] !='P. Hornung':
        count += 1
    #else:
    #   print(count)

2006


# Personal bonuses !

## Number of suspension by type

In [108]:
suspension_type = []
suspension = [row[3] for row in nfl_suspensions]
suspension_type = list(set(suspension))

In [109]:
print(suspension_type)

['Substance abuse', 'In-game violence', 'PEDs', 'Substance abuse, repeated offense', 'Personal conduct', 'PEDs, repeated offense']


In [112]:
count_suspension_type = {}

for row in nfl_suspensions:
    suspension = row[3]
    if suspension in count_suspension_type:
        count_suspension_type[suspension] +=1
    else:
        count_suspension_type[suspension] =1

In [113]:
print(count_suspension_type)

{'Substance abuse, repeated offense': 20, 'Personal conduct': 60, 'PEDs, repeated offense': 6, 'PEDs': 134, 'In-game violence': 10, 'Substance abuse': 39}


 As a surprise, it seems that the most common suspension is due to consumption of performance-enhancing subtances. Followed by personal conduct. 

PED's consumption is very rarely repeated contrary to substance abuse.

In [115]:
PED_repeated = 6 / (134 /100)
print(PED_repeated)

4.477611940298507


In [116]:
Substance_repeated = 20 / (39/100)
print(Substance_repeated)

51.28205128205128


As we can see, PED's repeated consumption is very low with an approximative 4.5% of reiteration.

Repeated substance consumption is way up here, with a glorifying 51,3% of reiteration, let's have a look !

In [128]:
substance_taken = {}

for row in nfl_suspensions:
    suspension = row[3]
    substance = row[4]
    if suspension == 'Substance abuse' or suspension == 'Substance abuse, repeated offense':
        if substance in substance_taken:
            substance_taken[substance] +=1
        else:
            substance_taken[substance] =1
print(substance_taken)
            
            

{'Marijuana-related': 15, '': 40, 'Arrest, felony possession of codeine': 1, 'Cocaine-related': 1, 'Found with dried urine and "The Original Whizzinator"': 1, 'Alcohol-related': 1}


As we can see, in most case, the substance taken in not in the records... And marijuana is the most consumed recorded drug. Let's have a look for repeated behaviour. We still have 59 suspensions.

In [130]:
substance_taken = {}

for row in nfl_suspensions:
    suspension = row[3]
    substance = row[4]
    if suspension == 'Substance abuse, repeated offense':
        if substance in substance_taken:
            substance_taken[substance] +=1
        else:
            substance_taken[substance] =1
print(substance_taken)

{'Marijuana-related': 4, '': 13, 'Arrest, felony possession of codeine': 1, 'Cocaine-related': 1, 'Found with dried urine and "The Original Whizzinator"': 1}


Surprisingly, the alcohol consumption SUSPENSION is not in the list anymore.

But as before, most case don't record the substance taken (maybe because multiple drugs have been taken and they can't choose one) and marijuana is still top of the declared list !

### Personal conduct !

Let's have a look at the second most common cause of suspension, personal behaviour !


In [134]:
behaviour_list_count = {}

for row in nfl_suspensions:
    suspension = row[3]
    reason = row[4]
    if suspension == 'Personal conduct':
        if reason in behaviour_list_count:
            behaviour_list_count[reason] +=1
        else:
            behaviour_list_count[reason] = 1

In [135]:
print(behaviour_list_count)

{'Gambling-related': 5, 'Domestic violence': 15, 'Multiple arrests': 2, 'DUI manslaughter': 2, 'Arrest, possession of weapon': 1, 'Improper gifts while in college': 1, 'DUI arrest, multiple': 2, 'DUI arrest, drugs': 1, 'Arrest, possession of weapon and marijuana': 1, 'Mulitple arrests': 1, 'Accused of disorderly conduct, resisting arrest': 1, 'Sexual assault': 1, 'Misdemeanour assault': 2, 'DUI arrest': 6, 'Reckless driving': 2, 'Arrest, cocaine possession': 1, 'Misdemeanour gun charge': 1, 'DUI, marijuana-related': 1, 'Arrest, misdemeanour assault for pointing gun at stripper': 1, 'Marijuana-related': 1, 'Arrest, Marijuana-related': 1, '': 1, 'Dogfighting': 1, 'Accused of battery and resisting arrest': 1, 'Assault': 1, 'DUI arrest, possession of marijuana': 1, "Simple assault for spitting drink in a woman's face": 1, 'Felony arrest charges': 1, 'Disorderly conduct': 1, 'Simple battery, resisting arrest': 1, 'DUI (acquitted)': 1, 'Child endangerment': 1}


The top 3 of bad behaviour are .... Domestic violence (sportmanship at it's finest), DUI disrespect and gambling-related issues !

##### That's is for this dataset, thanks for reading !