# RFP: Betting on the Bachelor

## Project Overview
You are invited to submit a proposal that answers the following question:

### Who will win season 29 of the Bachelor?

*All proposals must be submitted by **1/15/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, read in the data you plan on using to train and test your model. Call `info()` once you have read the data into a dataframe. Consider using some or all of the following sources:
- [Scrape Fandom Wikis](https://bachelor-nation.fandom.com/wiki/The_Bachelor) or [the official Bachelor website]('https://bachelornation.com/shows/the-bachelor/')
- [Ask ChatGPT to genereate it](https://chatgpt.com/)
- [Read in csv files like this](https://www.kaggle.com/datasets/brianbgonz/the-bachelor-contestants?select=contestants.csv)

*Note, a level 5 dataset contains at least 1000 rows of non-null data. A level 4 contains at least 500 rows of non-null data.*

In [1]:
# Read data into a dataframe
# Don't forget to call info()!

import pandas
import requests
from bs4 import BeautifulSoup



In [9]:
contestants = {
    "Name" : [],
    "Birth Year" : [],
    "Hometown" : [],
    "Occupation" : [],
    "Season" : [],
    "Eliminated": []
}

def getHtml(url):
    try:
        html = requests.get(url)
        html.raise_for_status()
        return html
    except requests.exceptions.RequestException as e:
        print(f"Failed to get url data: {url}")
        raise SystemExit
    
def tableImplContestants(soup, seasonNum):
    global contestants

    try:
        table = soup.find('table', class_='article-table')
        
        if table:
            table = table.find_all('tr')[1:]
        else:
            table = soup.find('table', class_='fandom-table').find_all('tr')[1:]
    except Exception as e:
        print(f"Error on season {season}: {e}")
        return

    for row in table:
        try:
            columns = row.find_all('td')

            contestants['Name'].append(columns[0].text)
            contestants['Hometown'].append(columns[2].text)
            contestants['Occupation'].append(columns[3].text)

            contestants['Season'].append(seasonNum)

            eliminationStatus = columns[4].text

            if ("Winner" in eliminationStatus):
                contestants['Eliminated'].append(0)
            else:
                contestants['Eliminated'].append(1)

            aElm =columns[0].find('a')
            if aElm:
                url = f'https://bachelor-nation.fandom.com{aElm.get('href')}'
                
                try:
                    html = getHtml(url)
                    soup = BeautifulSoup(html.text, 'html.parser')

                    birthData = soup.find('div', attrs={'data-source':'born'}).find('div').text.replace(',', '').split()

                    if (birthData[0] != 'age'):
                        contestants['Birth Year'].append(birthData[2])
                    else:
                        contestants['Birth Year'].append(birthData[1])
                except:
                    contestants['Birth Year'].append(columns[1].text)
            else:
                contestants['Birth Year'].append(columns[1].text)
        except Exception as e:
            print(f"Error for grabbing row on season {seasonNum}: {e}")
            continue

def galleryImplContestants(soup, seasonNum):
    print("Doing season", seasonNum)

    gallery = soup.find('div', class_='wikia-gallery wikia-gallery-caption-below wikia-gallery-position-center wikia-gallery-spacing-medium wikia-gallery-border-small wikia-gallery-captions-center wikia-gallery-caption-size-medium')

    women = gallery.find_all('div', class_='wikia-gallery-item')
    

    for girl in women:
        womenData = girl.find('div', class_='lightbox-caption')

        print(womenData.find('a').text)

        br_values = []

        for br in womenData.find_all('br'):
            try:
                br_values.append(br.next_sibling.strip())
            except:
                print("Error")


        # print(br_values)
        
        contestants['Name'].append(womenData.find('a').text)
        contestants['Hometown'].append(br_values[1])
        contestants['Occupation'].append(br_values[2])
        contestants['Season'].append(seasonNum)


    
        url = f'https://bachelor-nation.fandom.com{womenData.find('a').get('href')}'
        
        try:
            html = getHtml(url)
            soup = BeautifulSoup(html.text, 'html.parser')

            birthData = soup.find('div', attrs={'data-source':'born'}).find('div').text.replace(',', '').split()

            if (birthData[0] != 'age'):
                contestants['Birth Year'].append(birthData[2])
            else:
                contestants['Birth Year'].append(birthData[1])
        except:
            contestants['Birth Year'].append(br_values[0])
        




bachelors = { # Note: Age is whenever the person was the bachelor
    "Name" : [],
    "Birth Year" : [],
    "Hometown" : [],
    "Occupation" : [],
    "Season" : [] 
}



# Get a list of all the seasons
seasonList = []

html = getHtml("https://bachelor-nation.fandom.com/wiki/Category:The_Bachelor_seasons")

soup = BeautifulSoup(html.text, 'html.parser')

for season in soup.find_all('li', class_="category-page__member")[1:]:
    seasonList.append("/wiki/" + season.text.strip())

# Iterate through each season
# ... first, find the bachelor, then get the data on said bachelor
# ... then, get the contestants

for season in seasonList:
    seasonNum = season.replace("(", "").split()[-1][:-1]


    # Get html for the season

    html = getHtml(f'https://bachelor-nation.fandom.com{season}')

    soup = BeautifulSoup(html.text, 'html.parser')

    bachelor = soup.find('div', attrs={'data-source':'bachelor'}).find('a').get('href')

    # Get html for the bachelor

    bhtml = getHtml(f'https://bachelor-nation.fandom.com{bachelor}')

    bSoup = BeautifulSoup(bhtml.text, 'html.parser')

    bachelors['Name'].append(bSoup.find('div', attrs={'data-source':'name'}).find('div').text)
    bachelors['Hometown'].append(bSoup.find('div', attrs={'data-source':'hometown'}).find('div').text)
    bachelors['Occupation'].append(bSoup.find('div', attrs={'data-source':'occupation'}).find('div').text)

    bachelors['Birth Year'].append(bSoup.find('div', attrs={'data-source':'born'}).find('div').text.replace(",", "").split()[2])
    bachelors['Season'].append(seasonNum)

    # Get contestants

    # NOTE: Seasons 1-7 is only has regular tables
    # NOTE: Season 8 is lacking data

    # if (int(seasonNum) < 8):
    #     tableImplContestants(soup, seasonNum)
    # el
    if(int(seasonNum) == 19):
        galleryImplContestants(soup, seasonNum)
        
        



        

Doing season 19
Whitney Bischoff
Becca Tilley
Kaitlyn Bristowe
Jade Roper
Carly Waddell
Britt Nilsson
Megan Bell
Kelsey Poe
Ashley Iaconetti
Mackenzie Deonigi
Samantha Steffen
Ashley Salter
Juelia Kinney
Nikki Delventhal
Jillian Anderson
Amber James
Tracy Darakis
Trina Scherenberg
Alissa Giambrone
Jordan Branch
Error
Kimberly Sherbach
Error
Tandra Steiner
Tara Eddings
Amanda Goerlitz
Bo Stanley
Brittany Fetkin
Kara Wilson
Michelle Davis
Nicole Meacham
Reegan Cornwell


### 2. Training Your Model
In the cell seen below, write the code you need to train a linear regression model. Make sure you display the equation of the plane that best fits your chosen data at the end of your program. 

*Note, level 5 work trains a model using only the standard Python library and Pandas. A level 5 model is trained with at least two features, where one of the features begins as a categorical value (e.g. occupation, hometown, etc.). A level 4 uses external libraries like scikit or numpy.*

In [40]:
# Train model here.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a linear regression model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))

# Don't forget to display the equation of the plane that best fits your data!

Mean Squared Error: 0.0


### 3. Testing Your Model
In the cell seen below, write the code you need to test your linear regression model. 

*Note, a model is considered a level 5 if it achieves at least 60% prediction accuracy or achieves an RMSE of 2 weeks or less.*

In [2]:
# Test model here.

### 4. Final Answer

In the first cell seen below, state the name of your predicted winner. 
In the second cell seen below, justify your prediction using an evaluation technique like RMSE or percent accuracy.

#### State the name of your predicted winner here.

#### Justify your prediction here.