# Football Assignment
In this project, you will use the skills and concepts we discussed this semester to ingest, manipulate, analyze, and report data using Python.

Some of the more helpful concepts could be used to complete this notebook:
* basic syntax, len() function, variables
* conditionals
* looping
* data structures: lists, dictionaries, and sets
* pandas
* regex - this is helpful to get text patterns
* JSON - reading and writing JSON files
* Pathlib for accessing the files, regex (if desired)

You have been provided a set of JSON files describing football games from the 2017 season. The files may or may not include all the games from that season. If a statistic in the provided data conflicts with *actual* real-world data, the correct answer is in the *provided* data. 

Use only the JSON files contained in the 'Full' folders (not 'Flattended').

The objective of this project is to answer the set of questions below. Your project's output is a JSON file containing the question (key) and the answer (value). The keys must be in the format qn, and the answer must be a value appropriate for the question.

The 'season' includes all games provided (including bowl games).

In [1]:
# Example of how to answer a question
answer_file = {} # create blank dictionary
answer_file['q1'] = 'yes' # Answer 'yes' to Question 1
print(answer_file)

{'q1': 'yes'}


> You must name the file 'mis501_python_project_*netid*.json', for example, mis511_python_project_gjbott.json.

In [2]:
from pathlib import Path as path
import os
import json
from glob import glob
from pprint import pprint as p
import re

In [3]:
def gen_file_name(game_path: str):
    file_name = path(game_path).name
    file_name = file_name.replace('.json', '')
    return re.sub(re.compile('[^a-zA-Z ]'), '', file_name).strip()

In [17]:
questions = {}


game_files = glob("./2017_football/**/full/*.json", recursive=True)

games = {}
for game_file in game_files:
    with open(game_file, "r") as fp:
        games[gen_file_name(game_file)] = json.load(fp)

## Question 1
How many games are in the data set?

In [18]:
questions["q1"] = len(games.keys())

## Question 2
What are topmost keys for each game file?

In [19]:
list(games[list(games.keys())[0]].keys())[0]

'scoringPlays'

In [20]:
questions["q2"] = list(games[list(games.keys())[0]].keys())[0]

One of the challenges in data analysis is that the data being analyzed may have irregularities or errors that impact the accuracy of the results. For example, does that data set you've been given represent ALL games in the 2017 season? (This is not a question I need you to answer. It's just an example.) Although verifying the accuracy of the data is an important step, we will limit our scope to the titles of the files. 


Within the data set you've been given, are all teams refenced the same way (e.g., Texas A&M, Texas A and M, Texas A & M)? Are teams or competitions referenced more than once? To help answer this question, provide a python list of the teams represented in this data set, sorted alphabetically. Examine the file names to determine if a football game (i.e., competition) is duplicated.

In [22]:
unique_game_names = len(list(games.keys()))
num_games = len(game_files)

f"{num_games} total games and {unique_game_names} unique game names"

'874 total games and 868 unique game names'

## Question 3
Are all teams referenced consistently? (yes/no)

In [23]:
questions["q3"] = "yes"
game_names = list(games.keys())
game_names.sort()
game_names

['Air Force vs Army',
 'Air Force vs San Diego State',
 'Air Force vs UNLV',
 'Air Force vs Utah State',
 'Air Force vs VMI',
 'Air Force vs Wyoming',
 'Akron vs ArkansasPine Bluff',
 'Akron vs Ball State',
 'Akron vs Buffalo',
 'Akron vs Iowa State',
 'Akron vs Kent State',
 'Akron vs Ohio',
 'Alabama vs Arkansas',
 'Alabama vs Colorado State',
 'Alabama vs Florida State',
 'Alabama vs Fresno State',
 'Alabama vs LSU',
 'Alabama vs Mercer',
 'Alabama vs Ole Miss',
 'Alabama vs Tennessee',
 'Appalachian State vs Coastal Carolina',
 'Appalachian State vs Georgia Southern',
 'Appalachian State vs Louisiana',
 'Appalachian State vs New Mexico State',
 'Appalachian State vs Savannah State',
 'Appalachian State vs Wake Forest',
 'Arizona State vs Arizona',
 'Arizona State vs Colorado',
 'Arizona State vs NC State',
 'Arizona State vs New Mexico State',
 'Arizona State vs Oregon',
 'Arizona State vs San Diego State',
 'Arizona State vs USC',
 'Arizona State vs Washington',
 'Arizona vs Houst

### Question 3.1
Provide a Python list of all the teams represented in the files, sorted alphabetically.

In [24]:
game_names = list(games.keys())
game_names.sort()
questions['q3.1'] = game_names


## Question 4
Does the data seem reliable? 
* 'yes' or 'no'

In [None]:
questions["q4"] = "yes"

### Question 4.1 
Write a sentence or two in support of how you answered question four. It must be based on quantifiable reasons obtained from the data set. If you fixed anything in the data set, explain what you did and why.

In [None]:
json.dumps(games)

## Question 5
How many unique teams are represented in the data?

## Question 6
Alabama has not always been blessed with strong placekickers. Is there evidence in the 2017 season that Alabama misses field goals more often than other teams nationwide? 
qn = 'yes' or 'no'
qn+1 = Write a sentence or two supporting how you answered qn. It must include quantifiable reasons obtained from the data set.

## Question 7
A *saftey* in football refers to when the offensive player who has possession of the football is tackled or willingly downs the ball in their end zone. Two points are awared to the defensive team. The offensive team loses possesion of the ball.

In how many games did a safety occur?

## Question 8
Which team scored the most safeties (include all teams with the same number if tied)?

## Question 9
Which teams (include all, if tied) gave up the most safeties?

## Question 10
Find the longest play for the 2017 season. (Ex. a 99 yard interception return) If there are several
of the same length, show them all. Show team matchup, quarter, clocktime, and play text for each of the plays.

## Question 11
How long were Alabama's FIRST and LAST offensive plays of the season? Provide the description of each play including the yardage.

## Question 12
How many times did Alabama punt in the 2017 season?

In [2]:
with open('mis501_python_project_ctcallahan2.json', "w") as f:
    json.dump({}, f)