## Using machine learning to predict the winner of the Super bowl
### below is my analysis of NFL data to predict the winner of the superbowl between kansas city chiefs and philidelphia eagles 

In [154]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the data into a pandas DataFrame
data = pd.read_csv("NFL_data.csv")
datacopy = data

# Encode the "Result" column as 1 for "Win" and 0 for "Loss"
data["Result"] = data["Result"].apply(lambda x: 1 if x == "Win" else 0)

# Drop the "Team" column from the input data
X = data.drop(["Result", "Team"], axis=1)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, data["Result"], test_size=0.2, random_state=42)

# Train a logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = model.predict(X_test)

# Calculate the accuracy, precision, recall, and F1 score of the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, zero_division=1)
recall = recall_score(y_test, y_pred, zero_division=1)
f1 = f1_score(y_test, y_pred, zero_division=1)

# Print the evaluation metrics
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)

Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0


In [155]:
datacopy

Unnamed: 0,Team,WinLoss perc,PD,Year,FGM,FG_perc,RedZone_perc,playoff_win_perc,Score_perc,Turnover_perc,RushYperG,PassYperG,PointperG,possperG,YallowedperG,PointallowedperG,perc_punt_20,Result
0,New England Patriots,68.8,111,2018,27,84.4,59.6,68.750000,40.8,9.2,127.3,266.1,27.2,0.518056,359.1,20.3,36.000000,1
1,Miami Dolphins,43.8,-114,2018,18,90.0,51.6,0.000000,28.3,12.5,108.6,181.2,19.9,0.468333,391.1,27.1,40.200000,0
2,Buffalo Bills,37.5,-105,2018,22,78.6,59.5,0.000000,26.4,15.5,124.0,174.6,16.8,0.508056,294.1,23.4,32.966667,0
3,New York Jets,25.0,-108,2018,33,91.7,44.4,0.000000,32.0,14.7,101.4,197.8,20.8,0.486111,380.4,27.6,28.000000,0
4,Baltimore Ravens,62.5,102,2018,35,89.7,55.9,62.500000,40.7,10.7,152.6,222.4,24.3,0.548333,292.9,17.9,42.400000,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
123,Carolina Panthers,29.4,-100,2021,26,89.7,53.2,0.000000,29.2,14.9,108.4,190.5,17.9,0.511944,305.9,23.8,21.100000,0
124,Los Angeles Rams,70.6,88,2021,32,94.1,60.0,70.588235,45.9,12.2,99.0,273.1,27.1,0.484167,344.9,21.9,47.800000,1
125,Arizona Cardinals,64.7,83,2021,30,81.1,60.0,64.705882,44.7,7.8,122.1,251.5,26.4,0.520833,329.2,21.5,20.100000,0
126,San Francisco 49ers,58.8,62,2021,27,84.4,66.7,58.823529,41.2,11.5,127.4,248.3,25.1,0.519167,310.0,21.5,38.000000,0


The data table you provided contains information about different American football teams, their performance metrics, and results for two different years (2018, 2019 and 2020). Each row represents a different team, and the columns contain specific information about that team's performance.

Here is a breakdown of what each column in the table represents:

* Team: The name of the football team.
* WinLoss perc: The team's win-loss percentage for the given year.
* PD: The team's point differential for the given year.
* Year: The year the data is from.
* FGM: The team's field goal percentage for the given year.
* FG_perc: The team's overall field goal percentage for the given year.
* RedZone_perc: The team's red zone touchdown percentage for the given year.
* playoff_win_perc: The team's playoff win percentage.
* Score_perc: The team's scoring percentage.
* Turnover_perc: The team's turnover percentage.
* RushYperG: The team's rushing yards per game for the given year.
* PassYperG: The team's passing yards per game for the given year.
* PointperG: The team's points per game for the given year.
* possperG: The team's average time of possession per game for the given year.
* YallowedperG: The team's yards allowed per game for the given year.
* PointallowedperG: The team's points allowed per game for the given year.
* perc_punt_20: The team's percentage of punts that were within the opponent's 20-yard line.
* Result: The winner of the superbowl

In [157]:
import pandas as pd
from sklearn.linear_model import LogisticRegression

# Load the data from the CSV file
sos = pd.read_csv('sos.csv')

# Create a new column called "Outcome" that contains the name of the winning team
sos['Outcome'] = sos.apply(lambda x: x['Winner'] if x['Winner SOS (ESPN)'] < x['Loser SOS (ESPN)'] else x['Loser'], axis=1)

# Create a logistic regression model to predict the outcome based on the SOS scores
model = LogisticRegression(max_iter=1000)
model.fit(sos[['Loser SOS (ESPN)', 'Winner SOS (ESPN)']], sos['Outcome'])

# Prompt the user to enter the names of the two teams they want to compare
team1 = input("Enter the name of team 1: ")
team2 = input("Enter the name of team 2: ")

# Check whether each team is found in the `Winner` column of the `sos` DataFrame
if not sos['Winner'].isin([team1]).any():
    print(f"Error: {team1} not found in the dataset.")
elif not sos['Winner'].isin([team2]).any():
    print(f"Error: {team2} not found in the dataset.")
else:
    # Look up the SOS scores for the two teams
    team1_sos = sos.loc[sos['Winner'] == team1, 'Winner SOS (ESPN)'].iloc[0]
    team2_sos = sos.loc[sos['Winner'] == team2, 'Winner SOS (ESPN)'].iloc[0]

    # Predict the winner based on the SOS scores
    if team1_sos > team2_sos:
        winner = model.predict([[team2_sos, team1_sos]])
    elif team2_sos > team1_sos:
        winner = model.predict([[team1_sos, team2_sos]])
    else:
        winner = "Tie"
    # Print the predicted winner
    print(f"The predicted winner between {team1} and {team2} is:", winner[0])


Enter the name of team 1: Kansas City Chiefs
Enter the name of team 2: Philadelphia Eagles
The predicted winner between Kansas City Chiefs and Philadelphia Eagles is: Kansas City Chiefs


1. The code loads the data from the "sos.csv" file into a pandas DataFrame called sos.
2. A new column called "Outcome" is added to the sos DataFrame, which contains the name of the winning team based on the SOS scores.
2. A logistic regression model is created to predict the outcome of each game based on the SOS scores of the two teams.
4. The user is prompted to enter the names of the two teams they want to compare.
5. The SOS scores for the two teams are looked up in the sos DataFrame.
6. If either team is not found in the sos DataFrame, an error message is printed to the console. Otherwise, the trained model is used to predict the winner based on their SOS scores.
7. The predicted winner is printed to the console.