<a href="https://colab.research.google.com/github/akjieettt/data-science-final-project/blob/main/DataScienceProject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# F1 Pit Stop Strategy Analysis: How Pit Stops Affect Race Outcomes

**Group Members**: Hrishi Kabra and Kiet Huynh

**Project Website**: https://akjieettt.github.io/data-science-final-project/

# **Collaboration Plan**

**Team Coordination:**
- Set up a private GitHub repository to coordinate all code, share datasets, and track progress
- Each member works on separate branches to implement features, which are merged via pull requests after code review to ensure consistency

**Technologies Used:**
- Version Control: Git and GitHub for source code management and collaboration
- Development Environment: Visual Studio Code Live Share, Google Colab, and Jupyter Notebooks for data analysis and prototyping
- Communication Tools: Small Family Collaboration Hub for offline discussions, FaceTime for online discussions and Google Docs for shared notes

**Meeting Schedule:**
- Consistently meet offline 2 - 3 times per week for 1 - 3 hours per session to discuss progress, solve problems, and coordinate tasks
- Outside of scheduled meetings, we communicate asynchronously via iMessage to stay aligned and share updates

**Task Management:**
- Tasks are divided based on expertise and interest
- Progress is tracked via a shared progress table (in a spreadsheet) to ensure deadlines are met and responsibilities are clear

Milestone 1
For Milestone 1 you should generate a roughly 1 page writeup (~500 words) listing a partner and one to three datasets that you are considering working with and why. For each dataset you should generate at least one question you hope to answer with that data as well as a small amount of ETL including 3-5 interesting stats and one graph. This is just an outline to make sure you are thinking and is not a commitment in any way. This will be published on your GitHub IO page so this also makes sure you’ve figured out how to get it uploaded!

You must also include a short collaboration plan describing how you are working together, what technologies you are using, and when / how often you are meeting to work on this project. Examples include: we setup a private Github repo to coordinate code and we met on Zoom X times…. or even we used LiveShare for CS Code Teletype for Atom or RemoteCollab for Sublime. Failure to turn in a collaboration plan that shows you coordinated will be a loss of professionalism points. The turned in result will need to reflect the understanding of both students

You should load one of these datasets and parse it into shape using the principles of tidy data discussed in class and display the data table in a reasonable format so demonstrating what data you have. This is to show that you have figured out how to get the data into your system and does not need to be a final version, but it should show that you can read in a data source for your project. You should clearly discuss the data and what challenges you had in formatting it.

You should submit the notebook through Canvas. In the absolute first cell of your notebook you must include your names, project title, and a hyperlink to your webpage at github.io; the webpage must be publicly readable on the internet (i.e, live) and must contain the same work that is in the submitted notebook. That is: the first cell of your notebook must be a markdown cell with a hyperlink to the generated webpage up at yourname.github.io. If this is not correct you will lose points. After this first cell you should continue with the other requirements including a description of your project, links to the data and other relevant resources, a collaboration plan, and the project goals.


(4 Points) Professionalism: You have used both code comments and markdown cells to professionally and clearly document your work including having a clear and clean notebook; linking to resources and documents; and doing so with code that is reasonable and efficient. Your notebook is correct and contains the required links. In addition, you have written code that is interpretable – it contains comments where needed to understand, variable names are reasonable, and code that is reasonable and efficient. You have followed directions to turn in the file, clearly labeling everything. You have cited all sources and how you used them in the written portion of your answers.
(4 Points) Website: Website is up, link was submitted and is correct. Notebook is professional and clean, the names of the group members, a title for the project, and other good practices as this is publicly posted.
(4 Points) Project Plan: Project plan is in place, relevant data is identified and links are provided, there are draft questions or hypothesis that the student is going to explore. Plan clearly explains how the data could be used to answer the question and addresses whether or not other data is needed.
(4 Points) Extraction, Transform, and Load (ETL): At least one data set(s) are loaded correctly using web scraping other techniques. The data is discussed in terms of what it is and how it could be used to answer the question of study. Where the data comes from and how it is collected is clearly documented with links and other relevant details. The data is imported and tidy according to the principles discussed in class. Dtypes are set properly and displayed within the notebook, NaN’s and other techniques are used following best practices discussed in class.
(4 Points) Exploratory Data Analysis (EDA): You should do some light EDA on your data and show 3 - 5 interesting summary statistics or groups that relate to the questions you are asking of your data along with a brief justification as to why these statistics are interesting and relevant. You should also give at least 1 graphic which shows some interesting property or distribution of your data (skew, histogram, scatter plot) etc., and explain why this is relevant to your question.




In [40]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Loading all F1 datasets
df_circuits = pd.read_csv("/content/circuits.csv")
df_status = pd.read_csv("/content/status.csv")
df_lap_times = pd.read_csv("/content/lap_times.csv")
df_sprint_results = pd.read_csv("/content/sprint_results.csv")
df_drivers = pd.read_csv("/content/drivers.csv")
df_races = pd.read_csv("/content/races.csv")
df_constructors = pd.read_csv("/content/constructors.csv")
df_constructor_standings = pd.read_csv("/content/constructor_standings.csv")
df_qualifying = pd.read_csv("/content/qualifying.csv")
df_driver_standings = pd.read_csv("/content/driver_standings.csv")
df_constructor_results = pd.read_csv("/content/constructor_results.csv")
df_pit_stops = pd.read_csv("/content/pit_stops.csv")
df_seasons = pd.read_csv("/content/seasons.csv")
df_results = pd.read_csv("/content/results.csv")

print(f"Loaded all 14 datasets")

Loaded all 14 datasets


In [41]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [42]:
%%shell
jupyter nbconvert --to html /content/drive/MyDrive/DataScienceProject.ipynb

[NbConvertApp] Converting notebook /content/drive/MyDrive/DataScienceProject.ipynb to html
[NbConvertApp] Writing 284454 bytes to /content/drive/MyDrive/DataScienceProject.html


