<a href="https://colab.research.google.com/github/UP-DSSoc/Jupyter-Supplemental-Exercises/blob/main/DS100_WS1_Exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Workshop 1: Foundations of Data Science | Machine Exercises

**Date**: October 11, 2025 @ 4-6 PM  
**Guest Mentor**: Mr. Lanz Avila Lagman, MSc.  
**Format**: Synchronous Online  

*This notebook covers Python fundamentals, loading data with pandas, exploratory data analysis (EDA), basic data visualization, and summary statistics.*

Copyright &copy; 2025 UP Data Science Society. All Rights Reserved.

---

## Part 1 | Preliminaries

### 1. Organizing your Files

**Instructions:**

1. Store all your project-specific codes inside a folder, let's say you name it as `Project folder`.
2. Inside `Project folder`, create a subdirectory named `Data`. Within `Data`, create two subfolders:

   * `Input`: for all raw input files.
   * `Output`: for all processed or generated output files.

Your final directory structure should look like this:

```
Project folder/
‚îú‚îÄ‚îÄ notebook_1.ipynb
‚îú‚îÄ‚îÄ notebook_2.ipynb
‚îú‚îÄ‚îÄ notebook_3.ipynb
‚îî‚îÄ‚îÄ Data/
    ‚îú‚îÄ‚îÄ Input/
    ‚îî‚îÄ‚îÄ Output/
```

---

#### 1.1 Setting Up Your Working Environment

Before diving into analysis, ensure that your environment is properly set up for clarity and consistency.

**Steps:**

1. Open **Jupyter Notebook** (or **JupyterLab**) and navigate to your `Project folder`.
2. Create a new notebook file (e.g., `workshop1_exercises.ipynb`).
3. Use clear naming conventions for notebooks, such as `EDA_students.ipynb` or `sales_analysis.ipynb`.
4. Remember to **save** your notebook frequently.
5. Learn essential keyboard shortcuts:

   * `Shift + Enter` ‚Üí Run a cell
   * `A` / `B` ‚Üí Add a cell above or below
   * `M` ‚Üí Convert a cell to Markdown
   * `Y` ‚Üí Convert a cell to Code

**Tip:** Keep your workspace tidy. Only open the notebooks you need to avoid confusion.

---

#### 1.2 Importing the Right Libraries

Having your imports organized at the top of your notebook makes your workflow clean and reproducible.

**Standard imports for data analysis:**

```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
```

**Guidelines:**

* Use standard aliases (`pd`, `np`, `plt`) for consistency.
* Keep all imports in one cell at the top of your notebook.
* If you install new packages, note them in a separate text file (like `requirements.txt`).

**Optional:** For project isolation, use a virtual environment with `conda` or `venv` to manage dependencies.

---

#### 1.3 Keeping Your Workspace Clean

A clean workspace helps prevent errors and keeps your analysis readable.

**Best practices:**

* Restart and clear outputs regularly: *Kernel ‚Üí Restart & Clear Output*
* Use `df.head()` instead of printing entire DataFrames.
* Comment out or remove experimental code before saving.
* Add section headers (e.g., `## üßπ Data Cleaning`) to separate workflow stages.
* Save both your notebook and output files regularly.

**Pro Tip:** A well-maintained notebook is easier to debug, share, and revisit later.


### 2. Making your Notebooks Easier to Read

A well-written notebook is not just about *getting results* ‚Äî it‚Äôs about making your analysis **understandable and reproducible**.
Readable notebooks help both you and your collaborators quickly grasp what‚Äôs happening, even months after the code was written.

---

#### 2.1 Writing Clear Markdown Cells

**Markdown** is the main language for formatting text in Jupyter Notebooks.
Use it to make your notebooks look clean and professional ‚Äî similar to writing a report with headings, lists, and emphasis.

**Common Markdown Syntax:**

| Purpose       | Syntax                     | Example Output                |
| ------------- | -------------------------- | ----------------------------- |
| Heading       | `#`, `##`, `###`           | ### This is a level 3 heading |
| Bold          | `**text**`                 | **Bold text**                 |
| Italic        | `*text*`                   | *Italic text*                 |
| Bullet list   | `- item`                   | - item                        |
| Numbered list | `1. item`                  | 1. item                       |
| Inline code   | `` `code` ``               | `pd.read_csv()`               |
| Code block    | <code>`python ... `</code> | A colored code box            |
| Image         | `![](path_to_image.png)`   | Embeds an image               |

üü¢ *Tip:* Avoid using HTML tags unless you are embedding an image. Markdown keeps your notebook portable and clean.

---

#### 2.2 When to Use Markdown vs. Raw NBConvert

* **Markdown cells** are best for:

  * Explanations, observations, and results
  * Section headers
  * Instructions and insights for readers

* **Raw NBConvert cells** are useful only if:

  * You plan to export the notebook to another format (like PDF or LaTeX)
  * You need content that should *not* be interpreted or rendered by Markdown

üü® *In most cases, stick with Markdown cells.* Raw cells are optional and used mainly for advanced notebook exports.

---

#### 2.3 Proper Spacing Between Markdown and Code Cells

* Leave **one blank line** between Markdown and code blocks ‚Äî it helps visually separate logic from narrative.
* Group related cells together; don‚Äôt mix explanations and code snippets randomly.
* Think of your notebook like a *story*: Markdown introduces context, code shows action, and outputs show results.

Example layout:

```markdown
### Exploring the Dataset
We start by loading the student dataset and viewing the first few rows.
```

In [None]:
import pandas as pd

df = pd.read_csv('Data/Input/students.csv')
df.head()

Unnamed: 0,Name,Age,Grade
0,Ana,20.0,2.25
1,Ben,,3.0
2,Carla,21.0,1.5
3,Dan,22.0,
4,Emma,20.0,1.75


---

#### 2.4 Adding Comments in Code Cells

Comments make your code easier to follow and maintain.
Use the hash (`#`) symbol to add short explanations.


In [None]:
# Load student dataset
df = pd.read_csv('Data/Input/students.csv')

# Display the first 5 rows to check structure
df.head()

Unnamed: 0,Name,Age,Grade
0,Ana,20.0,2.25
1,Ben,,3.0
2,Carla,21.0,1.5
3,Dan,22.0,
4,Emma,20.0,1.75


*Tips:*

* Try to write **why** you did something, not just **what** the code does.
* Keep comments short; expressing them mostly as phrases and rarely as a sentence.
* Update comments when changing the code when needed.
* Comments may be placed above a single code line or a single code block.

---

#### 2.5 Following PEP 8 Style Guidelines

PEP 8 is the official style guide for writing clean and consistent Python code.
It ensures readability across projects and teams.

**Key Practices:**

* Use 4 spaces per indentation level (not tabs).
* Limit lines to 79 characters.
* Add a single blank line between functions and classes.
* Use `snake_case` for variable and function names.
* Surround operators with a single space: `x = y + 2`, not `x=y+2`.
* Keep imports at the top of your notebook.

Example:

```python
# ‚úÖ Good
def compute_average(scores):
    total = sum(scores)
    return total / len(scores)

# ‚ùå Bad
def ComputeAverage(Scores): total=sum(Scores); return total/len(Scores)
```

Following PEP 8 helps your notebook look professional and readable, a habit that scales well into real-world data projects.

---

üß† **Reflection Question:**
* Take a look at one of your past notebooks. How can you make it more readable and well-structured based on these guidelines?

## Part 2 | Coding Fundamentals

### 1. Basic Syntax, Variables, & Data Types

**Instructions**:

1. Create a dictionary called student with the following keys and values:

    - **"name"** ‚Üí a string with the student's name Juan
    - **"age"** ‚Üí an integer with the student‚Äôs age 21
    - **"grades"** ‚Üí a list of 5 float values [1.25, 1.00, 2.75, 1.75, 1.50]

2. Write a program that:

    - Prints the student‚Äôs name and age in a single sentence.
    - Calculates and prints the average grade.
    - Adds a new key "status" to the dictionary with the value "**Excellent**" if the average is ‚â• 1.75, otherwise "**Good**".

In [None]:
# ANSWER FOR PROBLEM 1

### 2. Importing Libraries & Loading Data

**Instructions**:

1. Import the libraries **pandas** (as pd) and **matplotlib.pyplot** (as plt).
2. Read the dataset students.csv using **pd.read_csv()**.
3. Display the first 5 rows of the dataset.
4. Print the dataset‚Äôs shape (number of rows and columns).
5. Use **.info()** and **.describe()** to view dataset details and summary statistics.

In [None]:
# ANSWER FOR PROBLEM 4

### 3. Functions & Loops

**Instructions:**

1. Create a function **count_even(numbers)** which:
    - Takes a list of integers as input.
    - Uses a loop to count how many numbers are even.
    - Returns the count.

2. Then:
    - Create a list of numbers from 1 to 20.
    - Call the function and print the result

In [None]:
# ANSWER FOR PROBLEM 2

### 4. Exploratory Data Analysis (EDA)

**Instructions**:

1. Load the dataset students.csv (make sure it has at least 20 rows, some with missing values).
2. Perform these steps:
    - Print the shape of the dataset (rows √ó columns).
    - Use .isnull().sum() to check for missing values in each column.
    - Identify which column has the most missing values.
    - Compute the following on the Grade column:
        * Mean
        * Median
        * Standard deviation

In [None]:
# ANSWER FOR PROBLEM 5

### 5. Data Visualization

**Instructions**:

1. Load the dataset sales.csv.
2. Make a histogram of the Sales column.
3. Add labels for the x-axis (Sales), y-axis (Frequency), and a title.
4. Make a scatter plot of Sales vs Profit.
5. Label both axes, and add a title.
6. Based on the scatter plot, what relationship do you see between Sales and Profit?

In [None]:
# ANSWER FOR PROBLEM 6

### 6. Summary Statistics

**Instructions**:

1. Load the dataset `coffee.csv` (i.e., data on number of cups of cofee students drink during exam week).
2. Compute the following on the Cups_of_Coffee column:
    - Mean
    - Median
    - Mode
    - Variance
    - Standard deviation
3. What do these numbers tell us about student caffeine consumption during exams?

In [None]:
# ANSWER FOR PROBLEM 7

## Part 3 | Working with Generative AI Responsibly in Data Science

Generative AI is a powerful collaborator, but it's not a replacement for your thinking.  
Used correctly, it can help you clarify concepts, explore patterns, and communicate insights better.  
Used carelessly, it can make you lazy, overconfident, or even wrong.

The key lesson: **Don‚Äôt let AI think *for* you ‚Äî make it think *with* you.**

---

### 1. Make AI Work *for* You, Not the Other Way Around

Generative AI should never dictate your reasoning.  
Your role as a data scientist is to **command** the analysis ‚Äî AI is simply your assistant.

**Mindset:**
- You are the **expert**; AI is the **tool**.  
- Never copy outputs blindly; interpret, edit, and verify them.  
- Use AI for *clarity*, not for *authority.*

**Example Prompt:**
> ‚ÄúHere‚Äôs my code for analyzing sales trends. Summarize what it does and point out potential improvements.‚Äù

‚úÖ *Good use:* Saves time explaining code logic and helps spot blind spots.  
‚ùå *Bad use:* Asking AI to ‚Äúdo the entire project‚Äù without understanding the result.

Remember: **don‚Äôt be AI‚Äôs servant ‚Äî train it to serve your workflow.**

---

### 2. Always Verify and Challenge AI‚Äôs Output

AI can hallucinate, meaning it can sound confident but be wrong.  
As a data scientist, your job is to **question everything**, especially outputs that look too neat.

**Good Habits:**
- Cross-check AI answers with actual data or trusted documentation.
- Ask follow-up questions like:
  > ‚ÄúCan you show me the reference for that claim?‚Äù  
  > ‚ÄúWhat assumptions might be wrong here?‚Äù  
- Compare AI‚Äôs explanations with your own understanding before presenting.

**Mini-Challenge:**  
Try asking AI for the mean and median from your dataset, then compute them yourself.  
If there‚Äôs a mismatch, *investigate why.*

**Rule of Thumb:**  
Treat AI output like you would treat another analyst‚Äôs report: *be skeptical, then verify.*

---

### 3. Ask Thoughtfully and Engage Critically

AI gives better answers when you give better questions.  
Treat every prompt like a collaboration; the more context you give, the smarter the exchange becomes.

**Tips for Better Prompts:**
- Make your questions longer and more specific.  
  > ‚ÄúSummarize this dataset‚Äôs distribution‚Äù ‚Üí ‚ùå  
  > ‚ÄúUsing the students.csv dataset, describe the distribution of grades, note any outliers, and suggest how to visualize it‚Äù ‚Üí ‚úÖ
- Include your reasoning or hypothesis.  
  AI understands better when it knows your intent.
- Debate with the model. If you disagree with its reasoning, ask *why* ‚Äî it forces both you and the model to articulate clearly.
- When in doubt, use web search or real data to fact-check the response.

**In short:** Be a conversational scientist; question, discuss, and refine your insights with AI as your partner.

---

### 4. Use AI for Prototyping and Code Commentary

One of AI‚Äôs strongest suits is **rapid prototyping** ‚Äî turning rough ideas into working code quickly.  
However, the first version AI gives you is rarely the best one. Treat it like a draft to be refined.

**Best Uses:**
- Ask AI to write quick **starter code** that you can later optimize.  
- Let it suggest **alternative approaches** or **debug strategies** when you‚Äôre stuck.  
- Use it to **add comments** or explain existing code for readability and documentation.

**Example Prompts:**
> ‚ÄúAdd helpful inline comments to the code below.‚Äù  
> ‚ÄúRewrite this function with cleaner variable names.‚Äù  
> ‚ÄúCan you refactor this data-cleaning loop into a Pandas one-liner?‚Äù

**Why it Works:**
- You save time on boilerplate and documentation.  
- You gain fresh perspectives on structure and clarity.  
- You develop an iterative mindset ‚Äî prototype, test, and improve.

**Caution:**  
Don‚Äôt rely on AI for production-level accuracy. Always test its code and understand every line before using it in your project.

üß† **Reflection Question:**  
How will you use AI to **enhance** your data science workflow; for thinking, coding, and communicating ‚Äî without letting it take over your reasoning? Can you make learning materials using AI?


---

## üéâ Congratulations!!!

**You made it to the end of all the exercises! Great job!** üëè  

You‚Äôve covered Python basics, logic and loops, functions, classes, loading datasets, exploratory data analysis, visualizations, and summary statistics. That‚Äôs a solid foundation in Python and Data Science!

Keep practicing, experiment with your own datasets, and don‚Äôt be afraid to make mistakes because that‚Äôs how real learning happens. Well done!

Best,  
*The Education & Research Committee*


