**If you lost points on the last checkpoint you can get them back by responding to TA/IA feedback**  

Update/change the relevant sections where you lost those points, make sure you respond on GitHub Issues to your TA/IA to call their attention to the changes you made here.

Please update your Timeline... no battle plan survives contact with the enemy, so make sure we understand how your plans have changed.

# COGS 108 - Data Checkpoint

# Names

- Sheena Patel
- Anna Chen
- Shreya Velagala
- Catherine Du
- Esther Liu

# Research Question

<b> What is the relationship between time and interest in various TikTok fashion trends? </b> <br>

In Depth RQ: <br>
How can we predict the level of interest in various TikTok fashion trends across New York, California, and Texas using TikTok trend interest score data from January 2019 to December 2023 using a model that inputs the fashion trend name, monthly time data, and region to predict an interest rating between 0 - 100 and an associated label of 'low' (interest score < 25), 'rising' (interest score < 50), 'popular' (interest score < 75), and 'trending (interest score > 75) for the specified input in 2024? <br>

This question aims to develop a predictive model that evaluates the popularity of TikTok fashion trends in different regions and times, using a quantifiable interest rating system. The focus on a state-by-state analysis allows for a detailed understanding of regional preferences and trends overtime. <br>

We specifically focus on New York, California, and Texas because they are the top 3 most populous states in the US and also have high TikTok usage. Additionally, using our API pytrends to collect data off of Google trends is limited in the number of search queries so we can only choose around 3 states to be represented in this research. <br>

Interest Score/Interest Over Time Definition:  <br>
The "interest score" on Google Trends represents the relative popularity of a search query in a
specific region and time frame. It is indexed from 0 to 100, where 100 signifies the peak popularity
for the term. This score does not indicate the absolute search volume but rather shows the search term's popularity relative to the highest point on the chart for the given region and time. A higher score means more people are searching for that particular term at that time, while a lower score indicates lesser interest. The data is useful for identifying trends and understanding how interest in certain topics changes over time. <br>

## Background and Prior Work

Our team is curious to understand how current fashion trends can be understood to predict the rating of a piece of clothing. We aim to create a prediction model that can take types of clothing and dates as input to predict the rating of a piece of clothing. We hope this model will have real-world application by possibly allowing companies to predict ratings for different clothing pieces based on GenZ fashion to potentially improve clothing purchase rates. Many datasets exist that contain information about women’s clothing including product descriptions, reviews, and ratings. However, we would like to incorporate TikTok trends into our dataset to use TikTok fashion trends as a data feature when predicting the rating for clothing. 

Currently, we have a few datasets including Amazon Women's Fashion Dataset<a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1) and Women's Ecommerce Clothing Reviews<a name="cite_ref-2"></a>[<sup>2</sup>](#cite_note-2) that seem relevant to our project goals. Based on our initial research, our ideal dataset would include columns for clothing type, clothing description, clothing review, clothing rating, and a label for each row representing what type of TikTok trend the clothing item falls into based on the clothing description. Additionally, we found other projects that have approached similar problems. One example is this project: Amazon Women's Clothing Review<a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1). In the project, the researcher performs EDA by analyzing various trends in the sentiment for Amazon clothing reviews. We plan to incorporate a similar approach to our EDA by visualizing the trends in rating prediction, but analyzing prediction from the perspective of TikTok trends and views for different TikTok trends influencing rating. Another project that had a similar approach is Rating prediction<a name="cite_ref-3"></a>[<sup>3</sup>](#cite_note-3). This project performs rating prediction similar to what we would like to do. My team plans to use this project as an example when creating our prediction model and also use a similar approach for EDA using TF-IDF to assign TikTok trends to different clothing descriptions. 

The first project<a name="cite_ref-3"></a>[<sup>3</sup>](#cite_note-3) found that the main trigrams from the reviews fall into the positive sentiment categories. For instance, the research found that fit true size, run true size, fit just right, love love love, fit like glove, usual wear size, and every time wear were some of the most prominent trigrams. This research conducts more of a sentiment analysis to show that reviews and ratings are more positive in their dataset. The second project<a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1) performed EDA and tried to analyze what kinds of age groups make what kinds of reviews and ratings. The research project found that layering, lounge and swim clothing tend to have the best reviews. The research also interestingly found that higher age groups had worse ratings. What we found interesting about this project is that TF-IDF Vectorization was used to convert the clothing descriptions into vectors to be passed as input to the prediction model. I think we are going to have to take a similar approach to our project.

1. <a name="cite_note-1"></a> [^](#cite_ref-1) Jaewook. (2022, October 13). Amazon reviews on Women Dresses. Kaggle. https://www.kaggle.com/code/jaewook704/amazon-reviews-on-women-dresses
2. <a name="cite_note-2"></a> [^](#cite_ref-2) Agarap, Abien Fred, and Nicapotato. (2018, Feburary 3). Women’s e-Commerce Clothing Reviews. Kaggle. https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews
3. <a name="cite_note-3"></a> [^](#cite_ref-3) Matteoanzano. (2023, December 5). Customer review analysis with text mining. Kaggle. https://www.kaggle.com/code/matteoanzano111/customer-review-analysis-with-text-mining/input

# Hypothesis


These are a few possible hypotheses under exploration:
- TikTok trends that were more popular towards the end of quarantine (2019/2020), which we have data for, will be less popular in 2024 
- TikTok trends that were less popular in the end of quarantine (2019/2020), which we have data for, may be more popular in 2024. 
- TikTok trends that involve more seasonal clothing like festive clothing for Christmas or summer clothing like dresses and skirts will be popular in their respective seasons in 2024. For instance, skirts which are summer clothing pieces are more likley to be popular in summer 2024, while sweaters which are winter clothing items, are more likely to be popular in winter 2024. 
    - Trends with bright colors are more likely to have higher interest scores in the summer than in other seasons. 
    - Trends with knit material will have higher interest scores in winter and cotton material will have higher interest scores in summer.
    - Trends with more blouses will have higher interest scores in the summer and sweaters will have higher interest scores in winter. 
- TikTok trends that are considered to be the most popular of all time will continue to be very popular and have high interest scores throughout 2024. 
- TikTok trends that have been endorsed by celebrities and thus have had high interest scores in the past will continue to have high interest scores in 2024. 


# Data

## Data overview

We compiled a list of the top 30 TikTok trends from 1/1/2020 through 12/31/2023 using ChatGPT and external sources <a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1) <a name="cite_ref-2"></a>[<sup>2</sup>](#cite_note-2) and used Google Trends to search up these trend names and understand how the interest score rating changed overtime for them. There, we observed the data for “Interest over time”—which represents search interest (a value of 0-100 to represent peak popularity) relative to the highest point on the chart for the given region (the United States) and time (our aforementioned time frame)---where users searching for this trend name also searched with related queries. We collected 30 datasets by searching up 30 different top TikTok fashion trends on Google Trends to form our overall dataset of ~6500 datapoints. <br>


We describe the datasets below. We will be combining all of these datasets using pd.concat and then cleaning all the merged data together. The datasets will be stacked in rows along axis = 1 and then cleaned. The code for cleaning and combining the datasets is below the dataset descriptions. We use the [Pytrends API](https://pypi.org/project/pytrends/) to gather each dataset in a for loop into a separate data structure and then combine all the datasets using a dataframe at the very end to be cleaned. <br>



1. <a name="cite_note-1"></a> [^](#cite_ref-1) Howell, Carolyn. "TikTok Fashion Trends 2023: Unveiling the Hottest Styles." High Social, 31 Aug. 2023, www.highsocial.com/resources/tiktok-fashion-trends-2023-unveiling-the-hottest-styles/.
2. <a name="cite_note-2"></a> [^](#cite_ref-2) "Top TikTok Fashion Trends of 2023 (So Far)." Sweety High, www.sweetyhigh.com/read/top-tiktok-fashion-trends-2023-040323. Accessed 23 Feb. 2024.



<b> Dataset #1: Y2K </b>
Keyword: Y2K -> Keyword for one of the top TikTok Fashion Trends between 2019 and 2023
Dataset Name: Y2K 
Link to the dataset: [Google Trends Keyword Y2K](https://trends.google.com/trends/explore?date=2020-01-01%202023-12-31&geo=US&q=y2k&hl=en)
Number of observations: 209
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))

Description: The dataset, gathered using Pytrends API by searching the keyword 'Y2K' on Google Trends, is structured in a dataframe. It includes metrics like interest scores over time, with data types encompassing datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is required for converting datetime to integer month values and creating columns for trends using one-hot encoding. This preprocessing is essential for feeding the trends and months into a Machine Learning prediction model. Additionally, interest scores will be categorized into labels such as 'low', 'rising', 'popular', or 'trending' for model output. <br>

<img src=Y2k.png width="300" height="200" alt="Interest Score Overtime on Google Trends for Y2K">

<b> Dataset #2: Cottagecore </b>

Keyword: Cottagecore -> Keyword for one of the top TikTok Fashion Trends between 2019 and 2023
Dataset Name: Cottagecore
Link to the dataset: [Google Trends Keyword Cottagecore](https://www.google.com/url?q=https://trends.google.com/trends/explore?date%3D2020-01-01%25202023-12-31%26geo%3DUS%26q%3Dcottagecore%26hl%3Den&sa=D&source=docs&ust=1708745400653624&usg=AOvVaw0LOapYV77MsRKuz7RaC-Y8)
Number of observations: 209
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))

Description: The dataset, gathered using Pytrends API by searching the keyword 'Cottagecore' on Google Trends, is structured in a dataframe. It includes metrics like interest scores over time, with data types encompassing datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is required for converting datetime to integer month values and creating columns for trends using one-hot encoding. This preprocessing is essential for feeding the trends and months into a Machine Learning prediction model. Additionally, interest scores will be categorized into labels such as 'low', 'rising', 'popular', or 'trending' for model output. <br>

<img src=Cottagecore.png width="300" height="200" alt="Interest Score Overtime on Google Trends for Cottagecore">

<b> Dataset #3: E-Girl </b>

Keyword: E-Girl -> Keyword for one of the top TikTok Fashion Trends between 2019 and 2023
Dataset Name: E-Girl
Link to the dataset: [Google Trends Keyword E-Girl](https://trends.google.com/trends/explore?date=2020-01-01%202023-12-31&geo=US&q=e-girl&hl=en)
Number of observations: 209
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))

Description: The dataset, gathered using Pytrends API by searching the keyword 'Cottagecore' on Google Trends, is structured in a dataframe. It includes metrics like interest scores over time, with data types encompassing datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is required for converting datetime to integer month values and creating columns for trends using one-hot encoding. This preprocessing is essential for feeding the trends and months into a Machine Learning prediction model. Additionally, interest scores will be categorized into labels such as 'low', 'rising', 'popular', or 'trending' for model output. <br>

<img src=e-girl.png width="300" height="200" alt="Interest Score Overtime on Google Trends for Cottagecore">

<b> Dataset #4: Vintage Thrift </b>

Keyword: Vintage Thrift  -> Keyword for one of the top TikTok Fashion Trends between 2019 and 2023
Dataset Name: Vintage Thrift 
Link to the dataset: [Google Trends Keyword Vintage Thrift](https://trends.google.com/trends/explore?date=2020-01-01%202023-12-31&geo=US&q=vintage%20thrift&hl=en)
Number of observations: 209
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))

Description: The dataset, gathered using Pytrends API by searching the keyword 'Vintage Thrift' on Google Trends, is structured in a dataframe. It includes metrics like interest scores over time, with data types encompassing datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is required for converting datetime to integer month values and creating columns for trends using one-hot encoding. This preprocessing is essential for feeding the trends and months into a Machine Learning prediction model. Additionally, interest scores will be categorized into labels such as 'low', 'rising', 'popular', or 'trending' for model output. <br>

<img src=VintageThrift.png width="300" height="200" alt="Interest Score Overtime on Google Trends for Cottagecore">

<b> Dataset #5: Fairycore  </b>

Keyword: Fairycore -> Keyword for one of the top TikTok Fashion Trends between 2019 and 2023
Dataset Name: Fairycore
Link to the dataset: [Google Trends Keyword Fairycore](https://trends.google.com/trends/explore?date=2020-01-01%202023-12-31&geo=US&q=fairycore&hl=en)
Number of observations: 209
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))

Description: The dataset, gathered using Pytrends API by searching the keyword 'Fairycore' on Google Trends, is structured in a dataframe. It includes metrics like interest scores over time, with data types encompassing datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is required for converting datetime to integer month values and creating columns for trends using one-hot encoding. This preprocessing is essential for feeding the trends and months into a Machine Learning prediction model. Additionally, interest scores will be categorized into labels such as 'low', 'rising', 'popular', or 'trending' for model output. <br>

<img src=fairycore.png width="300" height="200" alt="Interest Score Overtime on Google Trends for Cottagecore">

<b> Dataset #6: Vanilla Girl </b>

Keyword: Vanilla Girl -> Keyword for one of the top TikTok Fashion Trends between 2019 and 2023
Dataset Name: Vanilla Girl <br> 
Link to the dataset: [Google Trends Keyword Vanilla Girl](https://trends.google.com/trends/explore?date=2020-01-01%202023-12-31&geo=US&q=vanilla%20girl&hl=en) <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas)) <br>

Description: The dataset, gathered using Pytrends API by searching the keyword 'Vanilla Girl' on Google Trends, is structured in a dataframe. It includes metrics like interest scores over time, with data types encompassing datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is required for converting datetime to integer month values and creating columns for trends using one-hot encoding. This preprocessing is essential for feeding the trends and months into a Machine Learning prediction model. Additionally, interest scores will be categorized into labels such as 'low', 'rising', 'popular', or 'trending' for model output. <br>

<b> Dataset #7: Clean Girl Aesthetic </b>

Keyword: Clean Girl Aesthetic -> Keyword for one of the top TikTok Fashion Trends between 2019 and 2023 <br>
Dataset Name: Clean Girl Aesthetic <br>
Link to the dataset: [Google Trends Keyword Clean Girl Aesthetic](https://trends.google.com/trends/explore?date=2020-01-01%202023-12-31&geo=US&q=clean%20girl%20aesthetic&hl=en)  <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The dataset, sourced using Pytrends API by searching 'Clean Girl Aesthetic' on Google Trends, is formatted as a dataframe. It features metrics such as interest scores over time, including datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). It requires data cleaning for transforming datetime into integer month values and for the creation of trend columns using one-hot encoding. This process is vital for integrating the trends and months into a Machine Learning prediction model. Further, the interest scores will be labeled as 'low', 'rising', 'popular', or 'trending' for the model's output. <br>


<b> Dataset #8: Blokecore  </b>

Keyword: Blokecore -> Keyword for one of the top TikTok Fashion Trends between 2019 and 2023
Dataset Name: Blokecore <br>
Link to the dataset: [Google Trends Keyword Blokecore](https://trends.google.com/trends/explore?date=2020-01-01%202023-12-31&geo=US&q=clean%20girl%20aesthetic&hl=en)   <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The dataset, compiled using the Pytrends API with 'Blokecore' as the search keyword on Google Trends, is presented in a dataframe format. It includes metrics such as interest scores over time, consisting of datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Necessary data cleaning involves converting datetime to integer month values and adding columns for trends via one-hot encoding. This preparation is crucial for incorporating the trends and months into a Machine Learning prediction model. Interest scores will also be classified under labels like 'low', 'rising', 'popular', or 'trending' for the model's output. <br>

<b> Dataset #9: Barbie Challenge </b>

Keyword: Barbie Challenge -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Barbie Challenge <br>
Link to the dataset: [Google Trends Keyword Barbie Challenge](https://trends.google.com/trends/explore?date=2020-01-01%202023-12-31&geo=US&q=barbie%20challenge&hl=en)   <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: This dataset, sourced via Pytrends API with 'Barbie Challenge' from Google Trends, is in a dataframe format. It includes interest scores over time, featuring datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning involves converting datetime to integer month values and creating trend columns using one-hot encoding for integration into a Machine Learning prediction model. Interest scores are labeled as 'low', 'rising', 'popular', or 'trending'. <br>


<b> Dataset #10: Shirt Jackets </b>

Keyword: Shirt Jackets -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Shirt Jackets <br>
Link to the dataset: [Google Trends Keyword Shirt Jackets](https://trends.google.com/trends/explore?date=2020-01-01%202023-12-31&geo=US&q=shirt%20jackets&hl=en)  <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: Gathered using the Pytrends API, the 'Shirt Jackets' dataset from Google Trends is in dataframe form. It contains metrics like interest scores, datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). The data requires cleaning for datetime conversion into integer months and for trend columns addition via one-hot encoding, essential for Machine Learning prediction model input. Interest scores will be classified as 'low', 'rising', 'popular', or 'trending'. <br>

<b> Dataset #11: Balletcore </b>

Keyword: Balletcore -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Balletcore <br>
Link to the dataset: [Google Trends Keyword Balletcore](https://trends.google.com/trends/explore?date=2020-01-01%202023-12-31&geo=US&q=Balletcore&hl=en) <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Balletcore dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>

<b> Dataset #12: Coastal Grandmother </b>

Keyword: Coastal Grandmother -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Coastal Grandmother <br>
Link to the dataset: [Google Trends Keyword Coastal Grandmother](https://trends.google.com/trends/explore?date=2020-01-01%202023-12-31&geo=US&q=Coastal%20Grandmother&hl=en) <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Coastal Grandmother dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>


<b> Dataset #13: Gingham </b>

Keyword: Gingham -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Gingham <br>
Link to the dataset: Google Trends Keyword Gingham <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Gingham dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>

<b> Dataset #14: Maxi Skirts </b>

Keyword: Maxi Skirts -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Maxi Skirts <br>
Link to the dataset: Google Trends Keyword Maxi Skirts <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Maxi Skirts dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>

<b> Dataset #15: Corset </b>

Keyword: Corset -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Corset <br>
Link to the dataset: Google Trends Keyword Corset  <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Corset dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>


<b> Dataset #16: Leg Warmers </b>>

Keyword: Leg Warmers -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Leg Warmers <br>
Link to the dataset: Google Trends Keyword Leg Warmers <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Leg Warmers dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>

<b> Dataset #17: Birkenstocks </b>>

Keyword: Birkenstocks -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Birkenstocks <br>
Link to the dataset: Google Trends Keyword Birkenstocks <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Birkenstocks dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>


<b> Dataset #18: Cloud Slides </b> <br>

Keyword: Cloud Slides -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Cloud Slides <br>
Link to the dataset: Google Trends Keyword Cloud Slides <br>
Number of observations: 209  <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Cloud Slides dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>


<b> Dataset #19: Leather </b>

Keyword: Leather -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Leather  <br>
Link to the dataset: Google Trends Keyword Leather <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Leather dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>


<b> Dataset #20: Funky Pants </b>

Keyword: Funky Pants -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Funky Pants <br>
Link to the dataset: Google Trends Keyword Funky Pants <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Funky Pants dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>

<b> Dataset #21: Sweater Vests </b>

Keyword: Sweater Vests -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Sweater Vests <br>
Link to the dataset: Google Trends Keyword Sweater Vests <br>
Number of observations: 209  <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Sweater Vests dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>

<b> Dataset #22: Linen Pants </b>

Keyword: Linen Pants -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Linen Pants <br>
Link to the dataset: Google Trends Keyword Linen Pants <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Linen Pants dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>


<b> Dataset #23: Tube Tops </b>

Keyword: Tube Tops -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Tube Tops <br>
Link to the dataset: Google Trends Keyword Tube Tops  <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Tube Tops dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>


<b> Dataset #24: Baggy pants </b>

Keyword: Baggy pants -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Baggy pants  <br>
Link to the dataset: Google Trends Keyword Baggy pants <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Baggy pants dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>

<b> Dataset #25: Low-rise </b>

Keyword: Low-rise -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Low-rise <br> 
Link to the dataset: Google Trends Keyword Low-rise <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Low-rise dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>

<b> Dataset #26: Crochet  </b>

Keyword: Crochet -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Crochet <br>
Link to the dataset: Google Trends Keyword Crochet <br>
Number of observations: 209  <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Crochet dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>

<b> Dataset #27: Platform Sandals </b>

Keyword: Platform Sandals -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Platform Sandals <br>
Link to the dataset: Google Trends Keyword Platform Sandals <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Platform Sandals dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>


<b> Dataset #28: Tomato Girl </b>

Keyword: Tomato Girl -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Tomato Girl <br>
Link to the dataset: Google Trends Keyword Tomato Girl <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Tomato Girl dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>

<b> Dataset #28: Soft Girl Aesthetic </b>

Keyword: Soft Girl Aesthetic -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Soft Girl Aesthetic <br>
Link to the dataset: Google Trends Keyword Soft Girl Aesthetic <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Soft Girl Aesthetic dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>

<b> Dataset #29 : Mermaid Core </b>

Keyword: Mermaid Core -> Top TikTok Fashion Trend between 2019 and 2023 <br>
Dataset Name: Mermaid Core <br>
Link to the dataset: Google Trends Keyword Mermaid Core <br>
Number of observations: 209 <br>
Number of variables: 3 (Interest Score, Trend, 3 regions (New York, California, and Texas))
Description: The Mermaid Core dataset, obtained through Pytrends API from Google Trends, is structured in a dataframe. It tracks interest scores over time and includes datetime, integer scores, and regional strings ('TX', 'CA', 'NY'). Data cleaning is necessary to transform datetime into integer months and to create trend-specific columns through one-hot encoding. This process aids in preparing the data for a Machine Learning prediction model. Interest scores are to be categorized as 'low', 'rising', 'popular', or 'trending'. <br>

In [None]:
## YOUR CODE TO LOAD/CLEAN/TIDY/WRANGLE THE DATA GOES HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION
%pip install pytrends

Collecting pytrends
  Downloading pytrends-4.9.2-py3-none-any.whl (15 kB)
Collecting lxml
  Downloading lxml-5.1.0-cp310-cp310-macosx_10_9_x86_64.whl (4.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.7/4.7 MB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: lxml, pytrends
Successfully installed lxml-5.1.0 pytrends-4.9.2
You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


In [29]:
# Relevant Imports
from pytrends.request import TrendReq
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [30]:
# List of Top 30 TikTok trends that we will iterate through
TikTokTrends = ["Y2K", "Cottagecore", "E-Girl", "Vintage Thrift", "Fairycore", "Vanilla Girl",
                "Clean Girl Aesthetic", "Blokecore", "Barbie Challenge", "Shirt Jackets", "Balletcore",
               "Coastal Grandmother", "Gingham", "Maxi Skirts", "Corset", "Leg Warmers", "Birkenstocks", "Cloud Slides",
               "Leather", "Funky Pants", "Sweater Vests", "Linen Pants", "Tube Tops", "Baggy pants", "Low-rise", "Crochet",
               "Platform Sandals", "Tomato Girl", "Soft Girl Aesthetic", "Mermaid Core"]
# Multiple dataframes where one dataframe per trend will be in this list
df_per_trend = []

In [31]:
# For loop that iterates through all top 27 TikTok Trends from 2020 to 2023 and populates a
# list (df_per_trend) with interest scores for each trend.
for trend in TikTokTrends:
    # google trends df
    # Initialize pytrends
    pytrends = TrendReq(hl='en-US', tz=360)

    # Define the keyword and timeframe
    kw_list = []
    kw_list.append(trend)
    print(kw_list)
    timeframe = '2020-01-01 2023-12-31'

    # Define geographic locations
    geo_locations = ['US-CA', 'US-TX', 'US-NY']  # California, Texas, New York, DEFAULT (ALL of US when state not specified)

    # Dictionary to hold data
    trends_data = {}

    # Fetching the data for each location
    for geo in geo_locations:
        pytrends.build_payload(kw_list, cat=0, timeframe=timeframe, geo=geo, gprop='')
        data = pytrends.interest_over_time()
        if not data.empty:
            trends_data[geo] = data[trend]

    # Combine data from different regions into one DataFrame
    df_curr_trend = pd.concat(trends_data, axis=1)

    df_curr_trend = df_curr_trend.reset_index()
    df_curr_trend["trend"] = trend
    df_per_trend.append(df_curr_trend)


['Y2K']
['Cottagecore']
['E-Girl']
['Vintage Thrift']
['Fairycore']
['Vanilla Girl']
['Clean Girl Aesthetic']
['Blokecore']
['Barbie Challenge']
['Shirt Jackets']
['Balletcore']
['Coastal Grandmother']
['Gingham']
['Maxi Skirts']
['Corset']
['Leg Warmers']
['Birkenstocks']
['Cloud Slides']
['Leather']
['Funky Pants']
['Sweater Vests']
['Linen Pants']
['Tube Tops']
['Baggy pants']
['Low-rise']
['Crochet']
['Platform Sandals']
['Tomato Girl']
['Soft Girl Aesthetic']
['Mermaid Core']


In [32]:
# Merging dataframes for all trends together
df_all_trends = pd.concat(df_per_trend, ignore_index=True)

In [33]:
df_all_trends.head()

Unnamed: 0,date,US-CA,US-TX,US-NY,trend
0,2020-01-05,13,17.0,17,Y2K
1,2020-01-12,15,15.0,12,Y2K
2,2020-01-19,11,14.0,15,Y2K
3,2020-01-26,9,14.0,17,Y2K
4,2020-02-02,8,9.0,20,Y2K


In [34]:
len(df_all_trends)
'''
5643
'''
df_all_trends["trend"].value_counts()
'''
Y2K                     209
Corset                  209
Crochet                 209
Low-rise                209
Baggy pants             209
Tube Tops               209
Linen Pants             209
Sweater Vests           209
Funky Pants             209
Leather                 209
Cloud Slides            209
Birkenstocks            209
Leg Warmers             209
Maxi Skirts             209
Cottagecore             209
Gingham                 209
Coastal Grandmother     209
Balletcore              209
Shirt Jackets           209
Barbie Challenge        209
Blokecore               209
Clean Girl Aesthetic    209
Vanilla Girl            209
Fairycore               209
Vintage Thrift          209
E-Girl                  209
Platform Sandals        209
'''

'\nY2K                     209\nCorset                  209\nCrochet                 209\nLow-rise                209\nBaggy pants             209\nTube Tops               209\nLinen Pants             209\nSweater Vests           209\nFunky Pants             209\nLeather                 209\nCloud Slides            209\nBirkenstocks            209\nLeg Warmers             209\nMaxi Skirts             209\nCottagecore             209\nGingham                 209\nCoastal Grandmother     209\nBalletcore              209\nShirt Jackets           209\nBarbie Challenge        209\nBlokecore               209\nClean Girl Aesthetic    209\nVanilla Girl            209\nFairycore               209\nVintage Thrift          209\nE-Girl                  209\nPlatform Sandals        209\n'

In [35]:
# Renaming Columns
df_all_trends.rename(columns={'US-CA': 'US-CA_Interest_Score',
                   'US-TX': 'US-TX_Interest_Score', 'US-NY': 'US-NY_Interest_Score'}, inplace=True)
df_all_trends.head()

Unnamed: 0,date,US-CA_Interest_Score,US-TX_Interest_Score,US-NY_Interest_Score,trend
0,2020-01-05,13,17.0,17,Y2K
1,2020-01-12,15,15.0,12,Y2K
2,2020-01-19,11,14.0,15,Y2K
3,2020-01-26,9,14.0,17,Y2K
4,2020-02-02,8,9.0,20,Y2K


In [36]:
# Data cleaning to assign labels 'low', 'rising', 'popular', and 'trending' to each datapoint.
# This will be the output ofr the prediction
def bin_interest_score(num):
    if num < 25:
        return 'low'
    elif num < 50:
        return 'rising'
    elif num < 75:
        return 'popular'
    else:
        return 'trending'

df_all_trends['US-CA_Interest_Label'] = df_all_trends['US-CA_Interest_Score'].apply(bin_interest_score)
df_all_trends['US-TX_Interest_Label'] = df_all_trends['US-TX_Interest_Score'].apply(bin_interest_score)
df_all_trends['US-NY_Interest_Label'] = df_all_trends['US-NY_Interest_Score'].apply(bin_interest_score)

In [37]:
df_all_trends.iloc[5200:5600]

Unnamed: 0,date,US-CA_Interest_Score,US-TX_Interest_Score,US-NY_Interest_Score,trend,US-CA_Interest_Label,US-TX_Interest_Label,US-NY_Interest_Label
5200,2023-07-16,32,23.0,13,Low-rise,rising,low,low
5201,2023-07-23,0,0.0,0,Low-rise,low,low,low
5202,2023-07-30,33,0.0,16,Low-rise,rising,low,low
5203,2023-08-06,82,19.0,0,Low-rise,trending,low,low
5204,2023-08-13,26,0.0,13,Low-rise,rising,low,low
...,...,...,...,...,...,...,...,...
5595,2023-02-05,29,30.0,33,Platform Sandals,rising,rising,rising
5596,2023-02-12,42,35.0,28,Platform Sandals,rising,rising,rising
5597,2023-02-19,45,57.0,38,Platform Sandals,rising,popular,rising
5598,2023-02-26,34,60.0,57,Platform Sandals,rising,popular,popular


In [38]:
# Extract the month and create a new column to be a column of numeric months
df_all_trends['date'] = pd.to_datetime(df_all_trends['date'])
df_all_trends.insert(1, 'month', df_all_trends['date'].dt.month)
df_all_trends.head()

Unnamed: 0,date,month,US-CA_Interest_Score,US-TX_Interest_Score,US-NY_Interest_Score,trend,US-CA_Interest_Label,US-TX_Interest_Label,US-NY_Interest_Label
0,2020-01-05,1,13,17.0,17,Y2K,low,low,low
1,2020-01-12,1,15,15.0,12,Y2K,low,low,low
2,2020-01-19,1,11,14.0,15,Y2K,low,low,low
3,2020-01-26,1,9,14.0,17,Y2K,low,low,low
4,2020-02-02,2,8,9.0,20,Y2K,low,low,low


In [39]:
# One Hot Encoding for Trends
TikTokTrends = ["Y2K", "Cottagecore", "E-Girl", "Vintage Thrift", "Fairycore", "Vanilla Girl",
                "Clean Girl Aesthetic", "Blokecore", "Barbie Challenge", "Shirt Jackets", "Balletcore",
               "Coastal Grandmother", "Gingham", "Maxi Skirts", "Corset", "Leg Warmers", "Birkenstocks", "Cloud Slides",
               "Leather", "Funky Pants", "Sweater Vests", "Linen Pants", "Tube Tops", "Baggy pants", "Low-rise", "Crochet",
               "Platform Sandals", "Tomato Girl", "Soft Girl Aesthetic", "Mermaid Core"]

for trend in TikTokTrends:
    df_all_trends[trend] = [1 if value == trend else 0 for value in df_all_trends['trend']]

In [40]:
df_all_trends.head()

Unnamed: 0,date,month,US-CA_Interest_Score,US-TX_Interest_Score,US-NY_Interest_Score,trend,US-CA_Interest_Label,US-TX_Interest_Label,US-NY_Interest_Label,Y2K,...,Sweater Vests,Linen Pants,Tube Tops,Baggy pants,Low-rise,Crochet,Platform Sandals,Tomato Girl,Soft Girl Aesthetic,Mermaid Core
0,2020-01-05,1,13,17.0,17,Y2K,low,low,low,1,...,0,0,0,0,0,0,0,0,0,0
1,2020-01-12,1,15,15.0,12,Y2K,low,low,low,1,...,0,0,0,0,0,0,0,0,0,0
2,2020-01-19,1,11,14.0,15,Y2K,low,low,low,1,...,0,0,0,0,0,0,0,0,0,0
3,2020-01-26,1,9,14.0,17,Y2K,low,low,low,1,...,0,0,0,0,0,0,0,0,0,0
4,2020-02-02,2,8,9.0,20,Y2K,low,low,low,1,...,0,0,0,0,0,0,0,0,0,0


In [42]:
df_all_trends.columns

Index(['date', 'month', 'US-CA_Interest_Score', 'US-TX_Interest_Score',
       'US-NY_Interest_Score', 'trend', 'US-CA_Interest_Label',
       'US-TX_Interest_Label', 'US-NY_Interest_Label', 'Y2K', 'Cottagecore',
       'E-Girl', 'Vintage Thrift', 'Fairycore', 'Vanilla Girl',
       'Clean Girl Aesthetic', 'Blokecore', 'Barbie Challenge',
       'Shirt Jackets', 'Balletcore', 'Coastal Grandmother', 'Gingham',
       'Maxi Skirts', 'Corset', 'Leg Warmers', 'Birkenstocks', 'Cloud Slides',
       'Leather', 'Funky Pants', 'Sweater Vests', 'Linen Pants', 'Tube Tops',
       'Baggy pants', 'Low-rise', 'Crochet', 'Platform Sandals', 'Tomato Girl',
       'Soft Girl Aesthetic', 'Mermaid Core'],
      dtype='object')

In [43]:
# Drop date and trend columns
df_all_trends = df_all_trends.drop(['date', 'trend'], axis=1)

In [46]:
df_all_trends.columns

Index(['month', 'US-CA_Interest_Score', 'US-TX_Interest_Score',
       'US-NY_Interest_Score', 'US-CA_Interest_Label', 'US-TX_Interest_Label',
       'US-NY_Interest_Label', 'Y2K', 'Cottagecore', 'E-Girl',
       'Vintage Thrift', 'Fairycore', 'Vanilla Girl', 'Clean Girl Aesthetic',
       'Blokecore', 'Barbie Challenge', 'Shirt Jackets', 'Balletcore',
       'Coastal Grandmother', 'Gingham', 'Maxi Skirts', 'Corset',
       'Leg Warmers', 'Birkenstocks', 'Cloud Slides', 'Leather', 'Funky Pants',
       'Sweater Vests', 'Linen Pants', 'Tube Tops', 'Baggy pants', 'Low-rise',
       'Crochet', 'Platform Sandals', 'Tomato Girl', 'Soft Girl Aesthetic',
       'Mermaid Core'],
      dtype='object')

# Ethics & Privacy

- Thoughtful discussion of ethical concerns included
- Ethical concerns consider the whole data science process (question asked, data collected, data being used, the bias in data, analysis, post-analysis, etc.)
- How your group handled bias/ethical concerns clearly described

Acknowledge and address any ethics & privacy related issues of your question(s), proposed dataset(s), and/or analyses. Use the information provided in lecture to guide your group discussion and thinking. If you need further guidance, check out [Deon's Ethics Checklist](http://deon.drivendata.org/#data-science-ethics-checklist). In particular:

- Are there any biases/privacy/terms of use issues with the data you propsed?
- Are there potential biases in your dataset(s), in terms of who it composes, and how it was collected, that may be problematic in terms of it allowing for equitable analysis? (For example, does your data exclude particular populations, or is it likely to reflect particular human biases in a way that could be a problem?)
- How will you set out to detect these specific biases before, during, and after/when communicating your analysis?
- Are there any other issues related to your topic area, data, and/or analyses that are potentially problematic in terms of data privacy and equitable impact?
- How will you handle issues you identified?

# Team Expectations 


Read over the [COGS108 Team Policies](https://github.com/COGS108/Projects/blob/master/COGS108_TeamPolicies.md) individually. Then, include your group’s expectations of one another for successful completion of your COGS108 project below. Discuss and agree on what all of your expectations are. Discuss how your team will communicate throughout the quarter and consider how you will communicate respectfully should conflicts arise. By including each member’s name above and by adding their name to the submission, you are indicating that you have read the COGS108 Team Policies, accept your team’s expectations below, and have every intention to fulfill them. These expectations are for your team’s use and benefit — they won’t be graded for their details.

* *Team Expectation 1*
* *Team Expectation 2*
* *Team Expecation 3*
* ...

# Project Timeline Proposal

Specify your team's specific project timeline. An example timeline has been provided. Changes the dates, times, names, and details to fit your group's plan.

If you think you will need any special resources or training outside what we have covered in COGS 108 to solve your problem, then your proposal should state these clearly. For example, if you have selected a problem that involves implementing multiple neural networks, please state this so we can make sure you know what you’re doing and so we can point you to resources you will need to implement your project. Note that you are not required to use outside methods.



| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 1/20  |  1 PM | Read & Think about COGS 108 expectations; brainstorm topics/questions  | Determine best form of communication; Discuss and decide on final project topic; discuss hypothesis; begin background research | 
| 1/26  |  10 AM |  Do background research on topic | Discuss ideal dataset(s) and ethics; draft project proposal | 
| 2/1  | 10 AM  | Edit, finalize, and submit proposal; Search for datasets  | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part   |
| 2/14  | 6 PM  | Import & Wrangle Data (Ant Man); EDA (Hulk) | Review/Edit wrangling/EDA; Discuss Analysis Plan   |
| 2/23  | 12 PM  | Finalize wrangling/EDA; Begin Analysis (Iron Man; Thor) | Discuss/edit Analysis; Complete project check-in |
| 3/13  | 12 PM  | Complete analysis; Draft results/conclusion/discussion (Wasp)| Discuss/edit full project |
| 3/20  | Before 11:59 PM  | NA | Turn in Final Project & Group Project Surveys |