# RFP: Betting on the Bachelor

## Project Overview
You are invited to submit a proposal that answers the following question:

### Who will win season 29 of the Bachelor?

*All proposals must be submitted by **1/15/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, read in the data you plan on using to train and test your model. Call `info()` once you have read the data into a dataframe. Consider using some or all of the following sources:
- [Scrape Fandom Wikis](https://bachelor-nation.fandom.com/wiki/The_Bachelor) or [the official Bachelor website]('https://bachelornation.com/shows/the-bachelor/')
- [Ask ChatGPT to genereate it](https://chatgpt.com/)
- [Read in csv files like this](https://www.kaggle.com/datasets/brianbgonz/the-bachelor-contestants?select=contestants.csv)

*Note, a level 5 dataset contains at least 1000 rows of non-null data. A level 4 contains at least 500 rows of non-null data.*

In [9]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

df = pd.read_csv("contestants.csv")
df = df.dropna()
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 27 entries, 393 to 422
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Name        27 non-null     object 
 1   Age         27 non-null     float64
 2   Occupation  27 non-null     object 
 3   Hometown    27 non-null     object 
 4   Height      27 non-null     float64
 5   ElimWeek    27 non-null     float64
 6   Season      27 non-null     int64  
dtypes: float64(3), int64(1), object(3)
memory usage: 1.7+ KB


In [2]:
df.head()

Unnamed: 0,Name,Age,Occupation,Hometown,Height,ElimWeek,Season
393,Alexis,23.0,Aspiring Dolphin Trainer,"Secaucus, NJ",66.0,5.0,21
394,Angela,26.0,Model,"Greenville, SC",67.0,1.0,21
395,Astrid,26.0,Plastic Surgery Office Manager,"Tampa, FL",67.5,4.0,21
396,Briana,28.0,Surgical Unit Nurse,"Salt Lake City, UT",64.0,1.0,21
397,Brittany,26.0,Travel Nurse,"Santa Monica, CA",62.0,3.0,21


In [3]:
dfb = pd.read_csv(r"bachelors.csv")
dfb = dfb.dropna()
dfb.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 14 to 15
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      2 non-null      object 
 1   Age       2 non-null      int64  
 2   Hometown  2 non-null      object 
 3   Height    2 non-null      float64
 4   Season    2 non-null      int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 96.0+ bytes


In [10]:
url = "https://bachelor-nation.fandom.com/wiki/The_Bachelor"
response = requests.get(url)
html_content = response.text
soup = BeautifulSoup(html_content, "html.parser")

In [20]:
tbody = soup.find('tbody')

rows = tbody.find_all('tr')

data = []
for row in rows:
    cols = row.find_all('td')
    cols = [col.text.strip() for col in cols]
    data.append(cols)

dfs = pd.DataFrame(data)

dfs = dfs.dropna()
dfs.columns = ['Season', 'Original Run', 'Bachelor', 'Winner', 'Runner-Up', 'Proposal', 'Still Together']
dfs.head()

Unnamed: 0,Season,Original Run,Bachelor,Winner,Runner-Up,Proposal,Still Together
1,1,"March 25 – April 25, 2002",Alex Michel,Amanda Marsh,Trista Rehn,No,No
2,2,"September 25 – November 20, 2002",Aaron Buerge,Helene Eksterowicz,Brooke Smith,Yes,No
3,3,"March 24 – May 21, 2003",Andrew Firestone,Jennifer Schefft,Kirsten Buschbacher,Yes,No
4,4,"September 24 – November 20, 2003",Bob Guiney,Estella Gardinier,Kelly Kuharski,No,No
5,5,"April 7 – May 26, 2004",Jesse Palmer,Jessica Bowlin,Tara Huckeby,No,No


In [25]:
url = "https://bachelor-nation.fandom.com/wiki/The_Bachelor_(Season_1)"
response = requests.get(url)
html_content = response.text
soup = BeautifulSoup(html_content, "html.parser")

In [27]:
tbody = soup.find('tbody')

rows = tbody.find_all('tr')

data = []
for row in rows:
    cols = row.find_all('td')
    cols = [col.text.strip() for col in cols]
    data.append(cols)

dfs1 = pd.DataFrame(data)

dfs1 = dfs.dropna()
dfs1.columns = ['Name', 'Age', 'Hometown', 'Occupation', 'Eliminated']
dfs1.head()

Unnamed: 0,Season,Original Run,Bachelor,Winner,Runner-Up,Proposal,Still Together
1,1,"March 25 – April 25, 2002",Alex Michel,Amanda Marsh,Trista Rehn,No,No
2,2,"September 25 – November 20, 2002",Aaron Buerge,Helene Eksterowicz,Brooke Smith,Yes,No
3,3,"March 24 – May 21, 2003",Andrew Firestone,Jennifer Schefft,Kirsten Buschbacher,Yes,No
4,4,"September 24 – November 20, 2003",Bob Guiney,Estella Gardinier,Kelly Kuharski,No,No
5,5,"April 7 – May 26, 2004",Jesse Palmer,Jessica Bowlin,Tara Huckeby,No,No


### 2. Training Your Model
In the cell seen below, write the code you need to train a linear regression model. Make sure you display the equation of the plane that best fits your chosen data at the end of your program. 

*Note, level 5 work trains a model using only the standard Python library and Pandas. A level 5 model is trained with at least two features, where one of the features begins as a categorical value (e.g. occupation, hometown, etc.). A level 4 uses external libraries like scikit or numpy.*

In [4]:
X = df[["Age"]]
y = df["ElimWeek"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [5]:
model = LinearRegression()
model.fit(X_train, y_train)

LinearRegression()

In [6]:
slope = model.coef_[0]
intercept = model.intercept_
print(slope, intercept)

-0.5357142857142858 17.31746031746032


### 3. Testing Your Model
In the cell seen below, write the code you need to test your linear regression model. 

*Note, a model is considered a level 5 if it achieves at least 60% prediction accuracy or achieves an RMSE of 2 weeks or less.*

In [7]:
# Test model here.

### 4. Final Answer

In the first cell seen below, state the name of your predicted winner. 
In the second cell seen below, justify your prediction using an evaluation technique like RMSE or percent accuracy.

#### State the name of your predicted winner here.

#### Justify your prediction here.