**If you lost points on the last checkpoint you can get them back by responding to TA/IA feedback**  

Update/change the relevant sections where you lost those points, make sure you respond on GitHub Issues to your TA/IA to call their attention to the changes you made here.

Please update your Timeline... no battle plan survives contact with the enemy, so make sure we understand how your plans have changed.

# COGS 108 - Data Checkpoint

# Names

- Christine Tang
- Charlene Hsu
- Katelyn Chan
- Khushi Raghuvanshi

# Research Question

Did age or sex have the largest influence on the survivability of individuals on Western passenger ships that sank in the 1900s and how did this differ between passenger ships? Given a specific passenger, passenger ship, age, and sex, what would the likelihood of their survival be?

## Background and Prior Work

Our topic was inspired and supported by ideas from the sinking of the Titanic movie where survival rates varied based on many factors. We want to hone in on three specific ship sinkings that occurred from the early to the late 1900s, including the Titanic (1912), RMS Lusitania (1915), and the MS Estonia (1994) to determine relationships between the variables affecting the survival rate. We also want to look into the time (year) difference between the early and late 1900s to determine any relationships or differences between factors throughout the years due to significant historical events occurring throughout the 1900s, one being the second wave of the Feminist movement. 

From prior knowledge and exposure to the Titanic movie, we noticed that gender, age, and social class majorly affected the survival rates as safety boat availability prioritized women, children, and higher-class individuals. When conducting our own online research on ship sinkings, we came across a site that detailed the survival percentages based on gender and age <a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1), noting that first-class passengers “had the highest survival rate at 62 percent”, while women and children survival rates of about “75 percent and 50 percent respectively” <a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1). Since this information came from a scholarly source, we trust the quantitative data provided, however, we collected our own data sources from Kaggle that we will further analyze and explore to look into trends and relationships between all three ships which will allow us to explore more in-depth information and relationships over time. 

Moreover, another source regarding the survival rates depending on varying factors, focusing on gender and crew/passenger level on the ships hypothesizes that women have a higher survival rate if the passengers comply with the “WCF (Women and Children First)” rule. Depending on whether men are willing to give a helping hand to the innately inferior women (in terms of physical abilities), this source believes that the enforcement of the rule establishes the true survival rate on sinking ships (based on the Titanic and Lusitania findings). Ultimately, Mikael Elinder and Oscar Erixson found that cultural conditions play a large factor in the resulting survivability between men, women, and children during maritime disasters and how moral sentiments are taken into account in the moment <a name="cite_ref-2"></a>[<sup>2</sup>](#cite_note-2). There are many confounding variables that affect survivability that are often overlooked, leading us with a desire to find concrete demographical evidence and later research into additional influential factors. 


1. <a name="cite_note-1"></a> [^](#cite_ref-1) “Titanic.” Bowdoin, 2020, https://courses.bowdoin.edu/history-2203-fall-2020-kmoyniha/reflection/#:~:text=First%20class%20passengers%20had%20the,survived%20(Takis%2C%201999)
2. <a name="cite_note-2"></a> [^](#cite_ref-2) Elinder, Mikael, and Oscar Erixson. “Gender, Social Norms, and Survival in Maritime Disasters.” ReasearchGate, July 2012, https://www.researchgate.net/publication/230589088_Gender_Social_Norms_and_Survival_in_Maritime_Disasters 
3. “Titanic Survivors • Titanic Facts.” Titanic Facts, 13 July 2020, https://titanicfacts.net/titanic-survivors/
4. Survival Rates of Passengers and Crew. Survival Rates of Children Are... | Download Scientific Diagram, https://www.researchgate.net/figure/Survival-rates-of-passengers-and-crew-Survival-rates-of-children-are-only-available-for_fig1_230589088 

# Hypothesis


Our hypothesis is that the relevance of personal characteristics to survival rate changed over time within the 20th century. We hypothesize that age and gender would be the factors affecting survival rate the most in the early 1900's, and we think that in the late 1900's their affect on survival rates decreased. Considering that our data sets consist of passenger ships from the 1900s, we predict that the correlation between passengers’ gender and their survivability would weaken over time due to the campaigns for gender equality and that the importance of different characteristics changed over time.

# Data

## Data overview

Based on the information given in this dataset, we are hoping to observe age, sex, class, and ticket price as the main variables to observe how these variables affect the survival of the passengers. Ideally, the datasets would focus on passenger ships that sunk in the 1900s and include thorough and detailed information on each passenger’s age, class, gender, and ticket price. Every single passenger on each of the ships would be observed and documented to find the proportions of variables that may contribute to the survival rate of the passengers. We would be observing various ship sinking incidents, such as the Titanic, MS Estonia, and the RMS Lusitania are the main ones we will be analyzing. We will be focusing on the proportions of these variables instead of the count because the amount of passengers on each ship varies, which may alter the accuracy of our data if we use count instead of proportion.

For each dataset include the following information
- Dataset #1: 
  - Dataset Name: Titanic Dataset 
  - Link to the dataset: https://www.kaggle.com/datasets/sakshisatre/titanic-dataset?select=Titanic+Dataset.csv
  - Number of observations: 1309
  - Number of variables: 14
- Dataset #2:
  - Dataset Name: RMS Lusitania Complete Passenger Manifest
  - Link to the dataset: https://www.kaggle.com/datasets/rkkaggle2/rms-lusitania-complete-passenger-manifest/data
  - Number of observations: 1961
  - Number of variables: 16
- Dataset #3: 

  - Dataset Name: The Estonia Disaster Passenger List
  - Link to the dataset: https://www.kaggle.com/datasets/christianlillelund/passenger-list-for-the-estonia-ferry-disaster
  - Number of observations: 989
  - Number of variables: 8

Each dataset above covers a different ship sinking from a different year within the 1900s (Titanic from 1912, RMS Lusitania from 1915, MS Estonia from 1994). The primary variables that we will be using are the variables age, sex, and survived (whether a passenger survived or not). The data cleaning required is that we need to remove passengers with missing information, rename columns to be consistent between data sets, and standardize the format of the information (rounding age and standardizing sex and survived information). 
We will not be combining the datasets but rather using them for different time points within the 20th century as bases of comparison of the importance of each demographic variable over time. What we will do is standardize the variables to be able to compare them properly. The details of the variables in each dataset can be seen in the data cleaning process below. 

### Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import seaborn as sns

## Titanic Dataset

In [2]:
titanic_df = pd.read_csv("Titanic Dataset.csv")
titanic_df

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.00,0,0,24160,211.3375,B5,S,2,,"St Louis, MO"
1,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.5500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"
2,1,0,"Allison, Miss. Helen Loraine",female,2.00,1,2,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30.00,1,2,113781,151.5500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.00,1,2,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,3,0,"Zabour, Miss. Hileni",female,14.50,1,0,2665,14.4542,,C,,328.0,
1305,3,0,"Zabour, Miss. Thamine",female,,1,0,2665,14.4542,,C,,,
1306,3,0,"Zakarian, Mr. Mapriededer",male,26.50,0,0,2656,7.2250,,C,,304.0,
1307,3,0,"Zakarian, Mr. Ortin",male,27.00,0,0,2670,7.2250,,C,,,


## Column description

Name | Description
-------|------------
**pclass** | Ticket class indication socia-ecomic status; 1 = upper, 2 = middle, 3 = Lower
**survived** | A binary indication: 1 = survived, 0 otherwise
**name** | full name of passenger
**sex** | gender of passanger, male or female
**age** | the age of passengers in years
**SibSp** | The number of siblings or spouses the passanger had on Titanic
**parch** | The number of parents or children they had on board with them
**ticket** | The ticket number assigned to the passenger
**fare** | The fare paid by the passenger for the ticket
**cabin** | The cabin number assigned to the passenger, if available
**embarked** | The port of embarkation for the passenger. It can take one of three values: C = Cherbourg, Q = Queenstown, S = Southampton
**Boat** | If the passenger survived, this column contains the identifier of the lifeboat they were rescued in
**Body** | If the passenger did not survive, this column contains the identification number of their recovered body, if applicable
**Home.dest** | The destination or place of residence of the passenger

In [3]:
titanic_df.shape

(1309, 14)

We are going to remove the columns that will not be used in our analysis.

In [4]:
titanic_df.columns

Index(['pclass', 'survived', 'name', 'sex', 'age', 'sibsp', 'parch', 'ticket',
       'fare', 'cabin', 'embarked', 'boat', 'body', 'home.dest'],
      dtype='object')

We're only going to keep survived, sex, age

In [1]:
titanic_df = titanic_df[["survived", "sex", "age"]]
titanic_df

NameError: name 'titanic_df' is not defined

In [6]:
titanic_df.describe()

Unnamed: 0,survived,age
count,1309.0,1046.0
mean,0.381971,29.881138
std,0.486055,14.413493
min,0.0,0.17
25%,0.0,21.0
50%,0.0,28.0
75%,1.0,39.0
max,1.0,80.0


In [7]:
titanic_df.dtypes

survived      int64
sex          object
age         float64
dtype: object

First we're going to change the female and male to F or M for consistency with the other datasets

In [8]:
titanic_df['sex'].value_counts()

male      843
female    466
Name: sex, dtype: int64

In [9]:
def sex_update(df_ar):
    
    if 'female' in df_ar:
        output = 'F'
    elif 'male' in df_ar:
        output = 'M'
    return output

In [10]:
titanic_df['sex'] = titanic_df['sex'].apply(sex_update)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  titanic_df['sex'] = titanic_df['sex'].apply(sex_update)


In [11]:
titanic_df

Unnamed: 0,survived,sex,age
0,1,F,29.00
1,1,M,0.92
2,0,F,2.00
3,0,M,30.00
4,0,F,25.00
...,...,...,...
1304,0,F,14.50
1305,0,F,
1306,0,M,26.50
1307,0,M,27.00


Now we're going to tackle the problem of null values

In [12]:
titanic_df.isna().any()

survived    False
sex         False
age          True
dtype: bool

We can see that only age has null values

In [13]:
titanic_df['age'] = titanic_df['age'].fillna(0)
titanic_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  titanic_df['age'] = titanic_df['age'].fillna(0)


Unnamed: 0,survived,sex,age
0,1,F,29.0
1,1,M,0.92
2,0,F,2.0
3,0,M,30.0
4,0,F,25.0


We can see in age that some values are floats, we want to make it integer. We'll create a function to do that

In [14]:
def float_to_int(df_ar):
    output = int(df_ar)
    return output 

In [15]:
titanic_df['age'] = titanic_df['age'].apply(float_to_int)
titanic_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  titanic_df['age'] = titanic_df['age'].apply(float_to_int)


Unnamed: 0,survived,sex,age
0,1,F,29
1,1,M,0
2,0,F,2
3,0,M,30
4,0,F,25
...,...,...,...
1304,0,F,14
1305,0,F,0
1306,0,M,26
1307,0,M,27


Just to get an idea of our cleaned data, we will group it by survived and gender to see how many females and males survived.

In [16]:
titanic_df.groupby(['survived', 'sex']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,age
survived,sex,Unnamed: 2_level_1
0,F,127
0,M,682
1,F,339
1,M,161


## RMS Lusitania Complete Passenger Manifest

In [None]:
## YOUR CODE TO LOAD/CLEAN/TIDY/WRANGLE THE DATA GOES HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import seaborn as sns

Downloading CSV and getting columns

In [3]:
df_lus = pd.read_csv('LusitaniaManifest.csv')
df_lus.head()

Unnamed: 0.1,Unnamed: 0,Family name,Title,Personal name,Fate,Age,Department/Class,Passenger/Crew,Citizenship,Position,Status,City,Lifeboat,Rescue Vessel,Adult/Minor,Sex
0,0,CAMERON,Mr.,Charles W.,Lost,38.0,Band,Crew,British,,,,,,Adult,Male
1,1,CARR-JONES,Mr.,E.,Lost,37.0,Band,Crew,British,,,,,,Adult,Male
2,2,DRAKEFORD,Mr.,Edward,Saved,30.0,Band,Crew,British,Violin,,,,,Adult,Male
3,3,HAWKINS,Mr.,Handel,Saved,25.0,Band,Crew,British,Cello,,,,,Adult,Male
4,4,HEMINGWAY,Mr.,John William,Saved,27.0,Band,Crew,British,Double Bass,,,,,Adult,Male


## Column Description

Name | Description
-------|------------
**Unnamed** | Serial number of record
**Family name** | Last name of individual
**Title** | Salutation of Individual
**Personal name** | First name of Individual
**Fate** | Whether the passenger was Lost/Saved
**Age** | Age of the individual
**Department/Class** | Place of Residence/Occupation on the ship
**Passenger/Crew** | Role of the individual on the ship
**Citizenship** | Country of Origin of the Individual
**Position** | Specifc Occupation if the individual was a member of the crew
**Status** | Marital Status of Individual
**City** | City of Destination of Individual
**Lifeboat** | Indicates if the person found a lifeboat
**Rescue Vessel** | Indicates if the person was rescued by another vessel
**Adult/Minor** | Age category of the individual
**Sex** | Gender of the individual

In [4]:
df_lus.columns

Index(['Unnamed: 0', 'Family name', 'Title', 'Personal name', 'Fate', 'Age',
       'Department/Class', 'Passenger/Crew', 'Citizenship', 'Position',
       'Status', 'City', 'Lifeboat', 'Rescue Vessel', 'Adult/Minor', 'Sex'],
      dtype='object')

In [5]:
df_lus.shape

(1961, 16)

In [6]:
df_lus.dtypes

Unnamed: 0            int64
Family name          object
Title                object
Personal name        object
Fate                 object
Age                 float64
Department/Class     object
Passenger/Crew       object
Citizenship          object
Position             object
Status               object
City                 object
Lifeboat             object
Rescue Vessel        object
Adult/Minor          object
Sex                  object
dtype: object

We're only going to keep survived, sex, age

In [7]:
df_lus = df_lus[['Fate', 'Sex', 'Age']]
df_lus = df_lus.rename({'Fate': 'survived', 'Sex': 'sex', 'Age': 'age'}, axis='columns')
df_lus

Unnamed: 0,survived,sex,age
0,Lost,Male,38.0
1,Lost,Male,37.0
2,Saved,Male,30.0
3,Saved,Male,25.0
4,Saved,Male,27.0
...,...,...,...
1956,Saved,Male,
1957,Lost,Male,
1958,Lost,Male,
1959,Saved,Male,16.0


Replacing Survived and Sex Values

In [8]:
df_lus = df_lus.replace('Lost', 0).replace('Saved', 1)
df_lus = df_lus.replace('Male', 'M').replace('Female', 'F')
df_lus

Unnamed: 0,survived,sex,age
0,0,M,38.0
1,0,M,37.0
2,1,M,30.0
3,1,M,25.0
4,1,M,27.0
...,...,...,...
1956,1,M,
1957,0,M,
1958,0,M,
1959,1,M,16.0


Filtering ages and changing the age values into integers

In [9]:
df_lus['age'] = df_lus['age'].fillna(0).astype(int)
df_lus.head()

Unnamed: 0,survived,sex,age
0,0,M,38
1,0,M,37
2,1,M,30
3,1,M,25
4,1,M,27


Checking NaNs

In [10]:
df_lus.isna().any()

survived    False
sex         False
age         False
dtype: bool

Describing Data and Checking Values

In [11]:
df_lus.describe()

Unnamed: 0,survived,age
count,1961.0,1961.0
mean,0.391637,21.672106
std,0.488241,19.108447
min,0.0,0.0
25%,0.0,0.0
50%,0.0,25.0
75%,1.0,36.0
max,1.0,76.0


In [12]:
df_lus['sex'].value_counts(normalize=True)

sex
M    0.735849
F    0.264151
Name: proportion, dtype: float64

## The Estonia Disaster Passenger List

In [17]:
estonia_df = pd.read_csv('estonia-passenger-list.csv')
estonia_df

Unnamed: 0,PassengerId,Country,Firstname,Lastname,Sex,Age,Category,Survived
0,1,Sweden,ARVID KALLE,AADLI,M,62,P,0
1,2,Estonia,LEA,AALISTE,F,22,C,0
2,3,Estonia,AIRI,AAVASTE,F,21,C,0
3,4,Sweden,JURI,AAVIK,M,53,C,0
4,5,Sweden,BRITTA ELISABET,AHLSTROM,F,55,P,0
...,...,...,...,...,...,...,...,...
984,985,Sweden,ANNA INGRID BIRGITTA,OSTROM,F,60,P,0
985,986,Sweden,ELMAR MIKAEL,OUN,M,34,P,1
986,987,Sweden,ENN,QUNAPUU,M,77,P,0
987,988,Sweden,LY,GUNAPUU,F,87,P,0


## Column description

Name | Description
-------|------------
**PassengerId** | Distinct number given to each passenger on the MS Estonia
**Country** | Country from where that passenger is from
**Firstname** | First name of passenger
**Lastname** | Last name of passenger
**Sex** | Gender of passanger, male or female
**Age** | Age of passenger in years
**Category** | Role of person on board, crew or passenger
**Survived** | Binary indication of whether the passenger survived (1) or not (0)

In [18]:
estonia_df.shape

(989, 8)

In [19]:
estonia_df.columns

Index(['PassengerId', 'Country', 'Firstname', 'Lastname', 'Sex', 'Age',
       'Category', 'Survived'],
      dtype='object')

We're only focusing on the columns 'Survived', 'Sex', and 'Age'.

In [20]:
estonia_df = estonia_df[['Survived', 'Sex', 'Age']]
estonia_df

Unnamed: 0,Survived,Sex,Age
0,0,M,62
1,0,F,22
2,0,F,21
3,0,M,53
4,0,F,55
...,...,...,...
984,0,F,60
985,1,M,34
986,0,M,77
987,0,F,87


In [21]:
estonia_df.describe()

Unnamed: 0,Survived,Age
count,989.0,989.0
mean,0.138524,44.575329
std,0.345624,17.235146
min,0.0,0.0
25%,0.0,30.0
50%,0.0,44.0
75%,0.0,59.0
max,1.0,87.0


In [22]:
estonia_df.dtypes

Survived     int64
Sex         object
Age          int64
dtype: object

As you can see, age is an int, which is the type we will be working with in this project.

Now we check the number of survivors and the number of female and males in this dataset.

In [23]:
# Number of Survivors and Nonsurvivors
estonia_df['Survived'].value_counts()

0    852
1    137
Name: Survived, dtype: int64

In [24]:
# Number of Females and Males
estonia_df['Sex'].value_counts()

M    503
F    486
Name: Sex, dtype: int64

Check if the dataset that have any null values, which we don't want as they are missing data.

In [25]:
estonia_df.isna().any()

Survived    False
Sex         False
Age         False
dtype: bool

There are no null values, so we are good to continue.

Now we make sure there are no ages equal to or less than 0 that may skew our data.

In [26]:
# Check if all values in 'Age' are greater than 0
yes = []
for age in estonia_df['Age']:
    if age > 0:
        yes.append(True)
greater_than_0 = sum(yes)
greater_than_0 == 989

False

This means there is a value that is equal to 0 or less, let's rule it out of our dataset.

In [27]:
# Keep only those that have ages greater than 0
estonia_df = estonia_df.loc[estonia_df['Age'] > 0]
estonia_df

Unnamed: 0,Survived,Sex,Age
0,0,M,62
1,0,F,22
2,0,F,21
3,0,M,53
4,0,F,55
...,...,...,...
984,0,F,60
985,1,M,34
986,0,M,77
987,0,F,87


Now we group by 'Survived' and 'Sex' to see how many of each gender survived or not.

In [28]:
estonia_df.groupby(['Survived', 'Sex']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Age
Survived,Sex,Unnamed: 2_level_1
0,F,458
0,M,393
1,F,27
1,M,110


# Ethics & Privacy

We would exclude the names of everyone included in all 3 datasets that we’re going to use. Excluding names from the dataset is a good start to protect privacy. Regarding the lack of representation of certain ethnic groups, it's important to acknowledge that historical datasets like those from the 1900s often suffer from significant underrepresentation and bias. In the case of ships primarily from the US, Europe, or the UK, the passengers are likely to be predominantly white due to historical demographics and immigration patterns. This lack of diversity can lead to biased analyses and interpretations if not addressed properly.

We would be transparent about the limitations of our analysis, one of them being that we only have 3 ship datasets to look at. We’re not going to try to generalize our predictions to the whole world, but we will be looking at the similarities between these Western country ships that sank and try to find factors that affected their survival rate.

We will also clearly communicate that our findings are specific to historical passenger ship incidents and should not be directly extrapolated to modern contexts without further research. We emphasize the historical nature of our study and the specific conditions of the 1900s. 

Our analysis might unintentionally overlook the broader social and economic contexts that influenced survival rates, such as class disparities or gender roles, which could lead to an incomplete understanding of the factors at play since our data insufficiently covers these demographic factors. We will include discussions on the broader social and economic contexts in our analysis, acknowledging the influence of class, gender, and other factors on survival rates to provide a more nuanced interpretation of the data. 

One of the primary biases that may affect our results is the fact that the age for certain passengers cannot be found, which may skew some of the data towards people who were alive or those who were more willing to talk about their deceased families. Often people of minorities avoid speaking of their private information. 

With regards to unintended consequences, since the information is historical and our project is only predictive for the population of passenger ships in the 1900s, our unintended consequences are limited to family members of those who passed in passenger ship sinkings. We mitigate these issues by anonymizing individuals involved and only specifying the overall percentages of groups of each category. We have no issues regarding considerations of race considering that our data does not include such information and the gender of individuals will not be linked to their names in our project. 

Finally, By analyzing only three ship datasets, our findings might not be robust enough to draw significant conclusions, potentially leading to misinterpretations or overconfidence in the results, so we will transparently communicate the limitations of our study, including the limited scope of our datasets. We highlight that our analysis is exploratory and that further research with more comprehensive data is needed to confirm our findings. By considering and addressing these potential unintended consequences, we aim to conduct our analysis responsibly and ethically, ensuring that our project respects the privacy of individuals, acknowledges historical biases, and provides a balanced and contextually accurate interpretation of the data. 

# Team Expectations 


Read over the [COGS108 Team Policies](https://github.com/COGS108/Projects/blob/master/COGS108_TeamPolicies.md) individually. Then, include your group’s expectations of one another for successful completion of your COGS108 project below. Discuss and agree on what all of your expectations are. Discuss how your team will communicate throughout the quarter and consider how you will communicate respectfully should conflicts arise. By including each member’s name above and by adding their name to the submission, you are indicating that you have read the COGS108 Team Policies, accept your team’s expectations below, and have every intention to fulfill them. These expectations are for your team’s use and benefit — they won’t be graded for their details.

* We meet at least once a week for at least an hour.
* Meet in person at least bi-weekly to see our progress as a whole (2 hours).
* Do the tasks that are delegated to you. 
* Try our best to stick to the timeline.
* Communicate effectively with each other.
* Be on time, or notify the team.
* Be respectful of everyone's ideas. 

# Project Timeline Proposal

Specify your team's specific project timeline. An example timeline has been provided. Changes the dates, times, names, and details to fit your group's plan.

If you think you will need any special resources or training outside what we have covered in COGS 108 to solve your problem, then your proposal should state these clearly. For example, if you have selected a problem that involves implementing multiple neural networks, please state this so we can make sure you know what you’re doing and so we can point you to resources you will need to implement your project. Note that you are not required to use outside methods.



| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 4/26  |  5 PM | Look through old projects  | Determine best form of communication; Complete the project review and discuss the old projects. | 
| 5/3  |  5 PM |  Find topics that interest you, and the questions we might want to answer | Discuss and set on a project idea, try to find datasets we'd want to use; Complete the project proposal. | 
| 5/10  | 5 AM  | Look at the feedback for the proposal, and try to think on how we can approach our analysis | Revise the proposal ;Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part   |
| 5/17  | 5 PM  | Import & Wrangle Data | Review the wrangling and discuss plan for analysis; Submit the Checkpoint #1 |
| 5/24  | 5 PM  | Finalize wrangling/EDA; Begin Analysis | Discuss/edit Analysis, hopefully complete the analysis or discuss a plan to; |
| 5/31  | 5 PM  | Complete analysis| Discuss the conclusions; Submit Checkpoint #2 |
|  6/7  | 5 PM  | Draft results/conclusion/discussion | Finalize anything that's left|
| 6/12  | Before 11:59 PM  | NA | Turn in Final Project & Group Project Surveys |