# Football Assignment
In this project, you will use the skills and concepts we discussed this semester to ingest, manipulate, analyze, and report data using Python.

Some of the more helpful concepts could be used to complete this notebook:
* basic syntax, len() function, variables
* conditionals
* looping
* data structures: lists, dictionaries, and sets
* pandas
* regex - this is helpful to get text patterns
* JSON - reading and writing JSON files
* Pathlib for accessing the files, regex (if desired)

You have been provided a set of JSON files describing football games from the 2017 season. The files may or may not include all the games from that season. If a statistic in the provided data conflicts with *actual* real-world data, the correct answer is in the *provided* data. 

Use only the JSON files contained in the 'Full' folders (not 'Flattended').

The objective of this project is to answer the set of questions below. Your project's output is a JSON file containing the question (key) and the answer (value). The keys must be in the format qn, and the answer must be a value appropriate for the question.

The 'season' includes all games provided (including bowl games).

In [1]:
# Example of how to answer a question
answer_file = {} # create blank dictionary
answer_file['q1'] = 'yes' # Answer 'yes' to Question 1
print(answer_file)

{'q1': 'yes'}


> You must name the file 'mis501_python_project_*netid*.json', for example, mis511_python_project_gjbott.json.

In [15]:
from pathlib import Path as path
import os
import json
from glob import glob
from pprint import pprint as p
import re
import math

In [3]:
def gen_file_name(game_path: str):
    file_name = path(game_path).name
    file_name = file_name.replace('.json', '')
    return re.sub(re.compile('[^a-zA-Z ]'), '', file_name).strip()

In [4]:
questions = {}


game_files = glob("./2017_football/**/full/*.json", recursive=True)

games = {}
for game_file in game_files:
    with open(game_file, "r") as fp:
        games[game_file] = json.load(fp)

## Question 1
How many games are in the data set?

In [5]:
questions["q1"] = len(games.keys())
questions['q1']

874

## Question 2
What are topmost keys for each game file?

In [6]:
keys = {} 
for file, game in games.items():
    game_keys = list(game.keys())
    keys[file] = game_keys[0]
questions['q2'] = keys
questions['q2']

{'./2017_football/Week 11/full/400935316 - Stanford vs Washington.json': 'scoringPlays',
 './2017_football/Week 11/full/400934547 - Kansas State vs West Virginia.json': 'scoringPlays',
 './2017_football/Week 11/full/400945295 - Air Force vs Wyoming.json': 'scoringPlays',
 './2017_football/Week 11/full/400937515 - Boston College vs NC State.json': 'scoringPlays',
 './2017_football/Week 11/full/400935405 - Minnesota vs Nebraska.json': 'scoringPlays',
 './2017_football/Week 11/full/400933916 - Mississippi State vs Alabama.json': 'scoringPlays',
 './2017_football/Week 11/full/400935407 - Ohio State vs Michigan State.json': 'scoringPlays',
 './2017_football/Week 11/full/400938659 - Rice vs Southern Mississippi.json': 'scoringPlays',
 './2017_football/Week 11/full/400944871 - Appalachian State vs Georgia Southern.json': 'scoringPlays',
 './2017_football/Week 11/full/400945299 - UNLV vs BYU.json': 'scoringPlays',
 './2017_football/Week 11/full/400934550 - Oklahoma vs TCU.json': 'scoringPlays'

One of the challenges in data analysis is that the data being analyzed may have irregularities or errors that impact the accuracy of the results. For example, does that data set you've been given represent ALL games in the 2017 season? (This is not a question I need you to answer. It's just an example.) Although verifying the accuracy of the data is an important step, we will limit our scope to the titles of the files. 


Within the data set you've been given, are all teams refenced the same way (e.g., Texas A&M, Texas A and M, Texas A & M)? Are teams or competitions referenced more than once? To help answer this question, provide a python list of the teams represented in this data set, sorted alphabetically. Examine the file names to determine if a football game (i.e., competition) is duplicated.

In [7]:
team_names = []

for file_name in games.keys():
    clean_fn = gen_file_name(file_name)
    [team1, team2] = clean_fn.split("vs")
    team_names.append(team1.strip())
    team_names.append(team2.strip())

unique_team_names = list(set(team_names))
unique_team_names.sort()
unique_team_names

['Abilene Christian',
 'Air Force',
 'Akron',
 'Alabama',
 'Alabama AM',
 'Alabama State',
 'Albany',
 'Alcorn State',
 'American',
 'Appalachian State',
 'Arizona',
 'Arizona State',
 'Arkansas',
 'Arkansas State',
 'ArkansasPine Bluff',
 'Army',
 'Auburn',
 'Austin Peay',
 'BYU',
 'Ball State',
 'Baylor',
 'BethuneCookman',
 'Boise State',
 'Boston College',
 'Bowling Green',
 'Buffalo',
 'Cal Poly',
 'California',
 'Central Arkansas',
 'Central Connecticut',
 'Central Michigan',
 'Charleston Southern',
 'Charlotte',
 'Chattanooga',
 'Cincinnati',
 'Clemson',
 'Coastal Carolina',
 'Colgate',
 'Colorado',
 'Colorado State',
 'Connecticut',
 'Delaware',
 'Delaware State',
 'Duke',
 'EAST',
 'East Carolina',
 'Eastern Illinois',
 'Eastern Kentucky',
 'Eastern Michigan',
 'Eastern Washington',
 'Elon',
 'Florida',
 'Florida AM',
 'Florida Atlantic',
 'Florida Intl',
 'Florida State',
 'Fordham',
 'Fresno State',
 'Furman',
 'GardnerWebb',
 'Georgia',
 'Georgia Southern',
 'Georgia State'

## Question 3
Are all teams referenced consistently? (yes/no)

In [8]:
questions["q3"] = "yes"

### Question 3.1
Provide a Python list of all the teams represented in the files, sorted alphabetically.

In [9]:
questions['q3.1'] = team_names 


## Question 4
Does the data seem reliable? 
* 'yes' or 'no'

In [10]:
questions["q4"] = "yes"

### Question 4.1 
Write a sentence or two in support of how you answered question four. It must be based on quantifiable reasons obtained from the data set. If you fixed anything in the data set, explain what you did and why.

In [11]:
questions["q4.1"] = "While I had to format the file paths so that I could place the matchup of each game as the key to the dictionary, I did not have to change much else. There appeared to be not repeat of school names throughout my list of data"

## Question 5
How many unique teams are represented in the data?

In [13]:
questions['q5'] = len(unique_team_names)
questions['q5']

217

## Question 6
Alabama has not always been blessed with strong placekickers. Is there evidence in the 2017 season that Alabama misses field goals more often than other teams nationwide? 
qn = 'yes' or 'no'
qn+1 = Write a sentence or two supporting how you answered qn. It must include quantifiable reasons obtained from the data set.

In [18]:

total_fgs = 0
succesful_fgs = 0
alabama_fgs = 0
alabama_succesful_fgs = 0

for game in games.values():
    for drive in game['drives']['previous']:
        try:
            if drive["displayResult"] == "Field Goal":
                total_fgs += 1
                succesful_fgs += 1
                if drive["team"]["abbreviation"] == "ALA":
                    alabama_fgs += 1
                    alabama_succesful_fgs += 1
            if drive['displayResult'] == 'Missed FG':
                total_fgs += 1
                if drive["team"]["abbreviation"] == "ALA":
                    alabama_fgs += 1
        except KeyError:
            pass

overall_per = round((succesful_fgs/total_fgs) * 100, 2)
alabama_per = round((alabama_succesful_fgs/alabama_fgs) * 100, 2)

print(f"Overall FG Percentage: {overall_per}")
print(f"Alabama FG Percentage: {alabama_fgs}")

questions["q6"] = "yes"
questions["q6.1"] = f"This is because the overall FG percentage came in at {overall_per}% and the Alabama FG percentage came in at {alabama_per}%"
questions["q6.1"]

Overall FG Percentage: 73.06
Alabama FG Percentage: 26


'This is because the overall FG percentage came in at 73.06% and the Alabama FG percentage came in at 69.23%'

## Question 7
A *saftey* in football refers to when the offensive player who has possession of the football is tackled or willingly downs the ball in their end zone. Two points are awared to the defensive team. The offensive team loses possesion of the ball.

In how many games did a safety occur?

In [19]:
def is_safety(drive):
    try:
        if drive['displayResult'] == 'Safety':
            return True
        return False
    except KeyError:
        return False

def safety_occured(game):
    for drive in game['drives']['previous']:
        if is_safety(drive):
            return True
    return False


questions["q7"] = 0
for game in games.values():
    if safety_occured(game):
        questions["q7"] += 1

questions['q7']

46

## Question 8
Which team scored the most safeties (include all teams with the same number if tied)?

In [20]:
safeties = {}

for game in games.values():
    for drive in game['drives']['previous']:
        try:
            if drive['displayResult'] == 'Safety':
                safeties[drive['team']['abbreviation']] = safeties.get(drive['team']['abbreviation'], 0) + 1
        except KeyError:
            pass

safety_list = list(map(lambda x: (x[0], x[1]), safeties.items()))
safety_list.sort(key=lambda x: x[1], reverse=True)
top_safety = safety_list[0][1]
questions["q8"] = list(filter(lambda x: x[1] == top_safety, safety_list))
questions['q8']

[('MIZ', 2), ('USU', 2), ('IW', 2), ('CHSO', 2)]

## Question 9
Which teams (include all, if tied) gave up the most safeties?

In [24]:
safeties_given_up = {}

for game in games.values():
    [teamA, teamB] = game['teams']
    for drive in game['drives']['previous']:
        try:
            if drive['displayResult'] == 'Safety':
                if(drive['team']['abbreviation'] == teamA['team']['abbreviation']):
                    safeties_given_up[teamB['team']['abbreviation']] = safeties_given_up.get(teamB['team']['abbreviation'], 0) + 1
                else:
                    safeties_given_up[teamA['team']['abbreviation']] = safeties_given_up.get(teamA['team']['abbreviation'], 0) + 1
        except KeyError:
            pass

safety_list = list(map(lambda x: (x[0], x[1]), safeties_given_up.items()))
safety_list.sort(key=lambda x: x[1], reverse=True)
top_safety = safety_list[0][1]
questions["q9"] = list(filter(lambda x: x[1] == top_safety, safety_list))
questions["q9"]

[('FRES', 3)]

## Question 10
Find the longest play for the 2017 season. (Ex. a 99 yard interception return) If there are several
of the same length, show them all. Show team matchup, quarter, clocktime, and play text for each of the plays.

In [26]:


def build_plays(matchup, game):
    plays = []
    for drive in game['drives']['previous']:
        try:
            for play in drive['plays']:
                plays.append({
                    "length": play['start']['distance'],
                    "matchup": matchup, 
                    "quarter": play['period']['number'],
                    "clocktime": play['clock']['displayValue'],
                    "play_text": play['text']
                })

            return plays
        except KeyError:
            pass

raw_plays = [build_plays(matchup, game) for (matchup, game) in games.items()] 

plays = []
for raw_play in raw_plays:
    for play in raw_play:
        plays.append(play)

plays.sort(key=lambda x: x['length'], reverse=True)
questions["q10"] = plays[0]
questions['q10']

{'length': 65,
 'matchup': './2017_football/Week 3/full/400937464 - Old Dominion vs North Carolina.json',
 'quarter': 1,
 'clocktime': '10:21',
 'play_text': 'Jones, F kickoff 65 yards to the ODU0, HARPER, Isaiah return to the ODU21 (Artis, A;Ross, D), PENALTY NC offside defense 5 yards to the NC30, NO PLAY.'}

## Question 11
How long were Alabama's FIRST and LAST offensive plays of the season? Provide the description of each play including the yardage.

In [28]:
def get_alabama_games(games):
    def is_alabama_game(game):
        [teamA, teamB] = game['teams']
        return teamA['team']['abbreviation'] == 'ALA' or teamB['team']['abbreviation'] == 'ALA'

    return list(filter(lambda x: is_alabama_game(x), games))

def get_first_play(game):
    for drive in game['drives']['previous']:
        if(drive['team']['abbreviation'] == 'ALA'):
            try:
                return drive['plays'][0]
            except KeyError:
                pass
    return None

def get_last_play(game):
    drives = game['drives']['previous']
    for i in range(len(drives) -1, -1, -1):
        if(drives[i]['team']['abbreviation'] == 'ALA'):
            try:
                return drives[i]['plays'][-1]
            except KeyError:
                pass
    return None


alabama_games = get_alabama_games(games.values())


#sort games by week
alabama_games.sort(key=lambda x: x['week'], reverse=True)

#grab first and last game
first_game = alabama_games[0]
last_game = alabama_games[-1]

first_play = get_first_play(first_game)
last_play = get_last_play(last_game)

questions["q11"] = (first_play, last_play)
questions['q11']

({'period': {'number': 1},
  'homeScore': 0,
  'start': {'shortDownDistanceText': '1st and 10',
   'possessionText': 'ALA 14',
   'downDistanceText': '1st and 10 at ALA 14',
   'distance': 10,
   'yardLine': 86,
   'team': {'id': '333'},
   'down': 1,
   'yardsToEndzone': 86},
  'scoringPlay': False,
  'clock': {'displayValue': '12:17'},
  'type': {'id': '5', 'text': 'Rush', 'abbreviation': 'RUSH'},
  'priority': False,
  'statYardage': 9,
  'awayScore': 0,
  'wallclock': '2017-11-25T20:48:00Z',
  'modified': '2017-11-25T20:48Z',
  'end': {'shortDownDistanceText': '2nd and 1',
   'possessionText': 'ALA 23',
   'downDistanceText': '2nd and 1 at ALA 23',
   'distance': 1,
   'yardLine': 77,
   'team': {'id': '333'},
   'down': 2,
   'yardsToEndzone': 77},
  'id': '400933932101878201',
  'text': 'Damien Harris run for 9 yds to the Alab 23'},
 {'period': {'number': 4},
  'homeScore': 24,
  'start': {'shortDownDistanceText': '4th and 5',
   'possessionText': 'FSU 43',
   'downDistanceText':

## Question 12
How many times did Alabama punt in the 2017 season?

In [None]:
punts = 0

for game in games.values():
    for drive in game['drives']['previous']:
        try:
            if drive['displayResult'] == 'Punt' and drive['team']['abbreviation'] == 'ALA':
                punts += 1
        except KeyError:
            pass

questions["12"] = punts

In [None]:
with open('mis501_python_project_ctcallahan2.json', "w") as f:
    json.dump(questions, f)