<a href="https://colab.research.google.com/github/RMilos1/Titanic-Dataset/blob/main/YourLastName_YourFirstNameInitial_ChatGPT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h2><b>Titanic Data Analysis Assignment: Summaries, Visualization, and Modeling</b></h2>

<i>Developed by Milos Rodic</i>


---

### **📁 Before You Begin: Make Your Own Copy**

1. Open this notebook in **Google Colab** (File ▸ Open in Colab).
2. Save a personal copy to your Google Drive (**File ▸ Save a copy in Drive**).  
   All of your edits will be saved automatically.

> **Tip:** Colab lets you run each code cell with **Shift + Enter** or by clicking the ▶︎ button on the left of a cell.

---


<h2>Background</h2>

The Titanic dataset provides information about passengers on the RMS Titanic, which sank in 1912 after striking an iceberg.  
Each row represents a passenger and contains features such as:

* **Survived** (0 = No, 1 = Yes) – the target for classification  
* **Pclass** (1, 2, 3) – ticket class  
* **Name, Sex, Age** – demographic data  
* **SibSp, Parch** – number of siblings/spouses or parents/children aboard  
* **Fare** – ticket price  

You will practice loading data, computing descriptive statistics, creating visualizations, and building simple predictive models.

**Deliverables**

* A Colab/Jupyter notebook (**YourLastName_YourFirstNameInitial‑ChatGPT.ipynb**) containing your code & outputs.
* A document (**YourLastName_YourFirstNameInitial‑Titanic_Assignment.docx**) that logs your ChatGPT prompts & responses.

Within this notebook, paste each prompt _first_ and then the answer returned by ChatGPT. Use a different font color or highlight for clarity (e.g., wrap prompts in markdown block quotes).

---


In [None]:
# 📦 Install essential libraries for data analysis and visualization
!pip install --quiet pandas matplotlib seaborn scikit-learn

---

### **Part 1 – Reading the Titanic Dataset**

Use the code cell below to load **titanic.csv** from a public URL and display the first few rows.  
Verify that the columns match the dataset description.

> 📝 **Paste your prompt(s) in a cell bellow the code. Make sure that data shown corresponds to the data from the dataset. Use the example of a code below to comapre results. Ask ChatGPT to write or explain code for this step.**


In [None]:
#Example of a code that reads in the dataset and runs first five rows of the data
import pandas as pd

# raw CSV on GitHub
url = "https://raw.githubusercontent.com/RMilos1/Titanic-Dataset/main/Titanic-Dataset.csv"

# read the data
titanic = pd.read_csv(url)

# show a quick preview
titanic.head()   # first 5 rows


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [None]:
#Enter the generated code here


In [None]:
"""
🔶 Paste your ChatGPT prompt(s) and answer(s) for **Part 1** here.

"""

---

### **Part 2 – Descriptive Statistics (Mean, Median, Mode, Std)**

Calculate the **mean, median, mode, and standard deviation** for the **Age** and **Fare** columns.

1. First, ask ChatGPT for these four statistics _directly_ (without running code).
2. Paste the prompt(s) and ChatGPT's answer in the cell below (use triple‑quoted strings or markdown).
3. Note any assumptions ChatGPT makes about missing values or data distribution.



In [None]:
"""
🔶 Paste your ChatGPT prompt(s) and answer(s) for **Part 2** here.

"""

---

### **Part 3 – Python Code to Verify Statistics**

Now ask ChatGPT to generate Python code that **computes the same statistics** from the actual dataset.  
Run that code below and compare the results with the values ChatGPT claimed in Part 2.

> Include both your prompt(s) _and_ ChatGPT's generated code in the first cell—then execute the second cell to see the real numbers.


In [None]:
"""
🔶 Paste your ChatGPT prompt(s) and generated code for **Part 3** here.

"""

In [None]:
# ✅ Execute generated code below



---

### **Part 4 – Strategy to Compare Distributions**

Ask ChatGPT to outline a **statistical or visual strategy** for comparing the distribution of either **Age** _or_ **Fare** between passengers who survived and those who did not. If the ChatGPT does not propose the scatter plot and line of best fit, make sure to ask it to utilize these strategies.

Paste your prompt(s) and the resulting explanation below.


In [None]:
"""
🔶 Paste your ChatGPT prompt(s) and answer(s) for **Part 4** here.

"""

---

### **Part 5 – Code to Visualize Your Strategy**

Request Python code from ChatGPT that implements the visualization method chosen in Part 4 (e.g., histogram with KDE, box plot, swarm plot, etc.).

Paste the prompt in the first cell below, then run the generated code in the second cell to produce the chart.

> **Hint:** `seaborn` makes comparison plots easy with `sns.histplot`, `sns.boxplot`, or `sns.violinplot`.


In [None]:
"""
🔶 Paste your ChatGPT prompt(s) and generated code for **Part 5** here.

"""

In [None]:
# ✅ Execute the visualization code here


---

### **Part 6 – Predicting Fare from Age (Linear Regression)**

1. Ask ChatGPT whether **Age** can reasonably predict **Fare** in this dataset and what regression assumptions must hold.  
2. Paste that discussion below.
3. Then prompt ChatGPT for Python code that:

   * Builds a simple linear regression model with **Age → Fare**  
   * Fits the model  
   * Reports model coefficients and R²  
   * Predicts the fare for a **40‑year‑old** passenger

Paste ChatGPT's code in the first cell and run it in the second cell.


In [None]:
"""
🔶 Paste your ChatGPT prompt(s) and generated code for **Part 6** here.

"""

In [None]:
# ✅ Execute the regression code here


---

### **Part 7 (Extra Credit) – Predicting Survival with Logistic Regression**

If you choose to pursue extra credit:

1. Select relevant predictors (e.g., **Age, Fare, Sex, Pclass**).  
2. Encode **Sex** as numeric.  
3. Split the data into **training** and **test** sets.  
4. Fit a **logistic regression** model.  
5. Evaluate the model with **accuracy, precision, recall**, and/or an **ROC curve**.

Paste your prompt(s) and ChatGPT's code & commentary below. Then run the code to see the results. Ask ChatGPT to explain the code output step-by-step.

📌 _Be sure to interpret the coefficients and the evaluation metrics._


In [None]:
"""
🔶 Paste your ChatGPT prompt(s) and generated code for **Part 7** here.

"""

In [None]:
# ✅ Execute thimport pandas as pd