## Lab 03.00 - Intro to Notebooks and Matplotlib

### Part 0 - Exploring Jupyter Notebooks

Welcome to your first Jupyter Notebook! Before we dive into Matplotlib, let's explore some features of Jupyter Notebooks.

#### 0.0 - Notebook Structure and Cells
A Jupyter Notebook is made up of a series of cells. Each cell can contain either python code or markdown.

This structure allows you to mix explanations with executable code, making it an excellent tool for data analysis, teaching, and sharing your work.

To **create** a new block hover over an exisiting block, two buttons will appear, you can choose to create a Code or Markdown cell. This will place the block directly beneath the existing

#### 0.1 - Markdown Cells

This is a markdown cell. You can use it to write formatted text, including:

- **Bold text**
- *Italic text*
- `Code snippets`
- Lists (like this one!)

##### Exercise 00
Double-click on this cell to see its raw markdown content. Then, create a **new** markdown cell *directly* below this one and try formatting your own text. Include a heading, a list, and some bold or italic text.

Make sure to save Markdown cells by clicking the Check button that appears when editing a cell in the top-right

#### 0.2 - Code Cells
The cell below is a code cell. You can write and execute Python code in these cells.

You can click on the play button next to the cell to execute it and see its output

In [None]:
# This is a comment in a code cell

print("Hello, Jupyter!")

##### Exercise 02
Create a new code cell *directly* below this one. Write Python code that takes a name as input, and then prints that input with a greeting for example: `Howdy, Mr. Forlenza`

#### 0.3 - Deleting Cells

To delete a cell, select it and click the trash icon that appears on the top right of the cell

##### Excercise 03
Delete the cell **below** this one

<center>
    <img src="https://www.recovery-estonia.ee/wp-content/uploads/2017/02/deleted-file-recovery.jpg" width=300>
</center>

#### 0.4 - Collapsing and Expanding Code Cells

Jupyter Notebooks allow you to collapse code cells, which can be useful when you want to focus on specific parts of your notebook or when you're presenting your work.

To collapse a code cell:
1. Run the cell first (the output will still be visible when collapsed)
2. Click on the small line to left of the cell (or double-click on the left sidebar of the cell)

To expand a collapsed cell, simply click on the blue line again.

##### Exercise 04
1. Run the code cell below
2. Try collapsing it using the method described above
3. Expand it again

In [None]:
print("Hello, I'm a collapsible cell!")
for i in range(10):
    print(f"Line {i+1}")

Collapsing cells can help keep your notebook tidy, especially when you have long code blocks or output that you don't need to see all the time.

Now that you're familiar with the basics of Jupyter Notebooks, let's move on to our Matplotlib lab!

### Part 1 - Intro with Matplotlib
Before we dive into using Matplotlib, let's explore its context and significance in the data visualization landscape. 

To answer the questions, edit the markdown cell and put your answer below the question. 

**Make sure to save the markdown cell, by pressing the ✓ (check) icon in the top right after answering the questions**

Research and critically think about the following questions:

##### Question 00
What is Matplotlib, and what are its primary functions in data visualization for data science?
- **Answer:** Matplotlib is a Python library for creating static and interactive visualizations. It allows data scientists to produce various chart types, including line plots, bar charts, and histograms.

##### Question 01
Describe three common types of plots that can be created with Matplotlib
- **Answer:**  Line graph, bar graph, and scatter plot

Now that you've explored Matplotlib's role in data science, let's dive into using it!

By the end of this lab, you'll have a solid foundation in creating and customizing various types of plots using Matplotlib. This skill is crucial for data analysis, as visualizing your data can reveal patterns, trends, and insights that might not be apparent from raw numbers alone.

Let's get started by importing Matplotlib and creating our first plot!

First, let's install the matplotlib library, in your terminal run the following command:

```bash
pip3 install matplotlib
```

Now, let's import the necessary library:

In [None]:
import matplotlib.pyplot as plt

#### 1.0 - Basic Figure and Plotting
Before we create our plots, let's explore some basic Matplotlib functions to understand how they work. We'll experiment with different functions and observe what they do.

In [None]:
plt.plot([1, 2, 3], [1, 2, 3])
plt.title("My Graph")

##### Question 02
What happened when we called `plt.plot()`? Try commenting it out and see what changes.
- **Answer:**  When `plt.plot()` is called, it creates a line plot using the provided data, but if commented out, no data will be visualized on the resulting figure.

##### Question 03
What do the two lists in `plt.plot()` refer to? Try changing the numbers and see what changes.
- **Answer:**  The two lists in `plt.plot()` typically represent the x-coordinates and y-coordinates of the data points to be plotted, respectively, and changing these numbers will alter the position of the plotted points on the graph.

##### Question 04
What does the `plt.title()` function do? What happens if you comment it out?
- **Answer:** The `plt.title()` function adds a title to the plot, and if commented out, the plot will be displayed without a title

#### 1.2 - Sizing

In [None]:
plt.figure(figsize=(3, 3))
plt.plot([1, 2, 3], [1, 2, 3])
plt.title('Figure #1')

plt.figure(figsize=(10, 5)
plt.plot([1, 2, 3], [1, 2, 3])
plt.title('Figure #2')

##### Question 05
What do the numbers in `figsize=(10, 5)` control? Try changing them and observe what happens.
- **Answer:** The numbers in `figsize=(10, 5)` control the width (10 inches) and height (5 inches) of the figure, respectively, and changing them will alter the overall dimensions of the plot

##### Question 06
In what scenario would changing `figsize` be useful in data visualization?
- **Answer:** Changing `figsize` is useful when you need to adjust the dimensions of a plot to better fit the data, improve readability, or meet specific size requirements for publication or presentation

#### 1.3 - Colors and Styles

In [None]:
plt.figure(figsize=(10, 6))

plt.plot([1, 2, 3], [1, 2, 3], color='red')
plt.plot([1, 2, 3], [2, 3, 4], color='blue')
plt.plot([1, 2, 3], [3, 4, 5], color='green', linestyle='--')

plt.title('Different Line Styles')
plt.show()

##### Question 07
What did we change in our `plt.plot()` functions to modify a line's color? What other types values do you think we use here? (think CSS)
- **Answer:** Changed the color parameter in our `plt.plot()` functions to modify a line's color. Other color values that can be used include named colors (e.g., 'red', 'blue', 'green') or hexadecimal color codes.

##### Question 08
What did we change in our `plt.plot()` functions to modify the style of a line? What other linestyles are available for us to use in matplotlib? [Documentation](https://matplotlib.org/stable/).
- **Answer:** Changed the `linestyle` parameter in our `plt.plot()` functions to modify the style of a line. Other linestyles available in Matplotlib include '-' , '--' , ':' , and '-.' 

#### 1.4 - Labels & Text

In [None]:
plt.figure(figsize=(10, 6))
plt.plot([1, 2, 3], [1, 2, 3])
plt.title('My First Plot', pad=20)
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()

##### Question 09
What does the pad=20 in `plt.title()` do? Try different numbers.
- **Answer:** The `pad=20` in `plt.title()` adds padding between the title and the plot. Trying different numbers will adjust the amount of space between the title and the plot.

##### Question 10
What does the functions `xlabel` and `ylabel` do? Try commenting them out.
- **Answer:** The functions `xlabel` and `ylabel` add labels to the x-axis and y-axis of the plot, respectively. If commented out, the axes will not have labels, making it harder to understand what the axes represent,

#### 1.5 - Backgrounds

In [None]:
plt.figure(figsize=(10, 6), facecolor='lightgray')
plt.plot([1, 2, 3, 4], [1, 4, 2, 3])
plt.title('Gray Figure Background')
plt.show()

##### Question 11
What changed when we added `facecolor='lightgray'` to `plt.figure`?
- **Answer:** Adding `facecolor='lightgray'` to `plt.figure` changed the background color of the entire figure to light gray.

##### Question 12
What didnt change in this example? Hyposthesize why the whole graph did not change color.
- **Answer:** The plot area didn't change color in this example. This is because `facecolor` in `plt.figure` only affects the outer area of the figure, not the plot area itself.

Now let's change the plot area (inner) background

In [None]:
plt.figure(figsize=(10, 6))
ax = plt.gca()  # get current axes
ax.set_facecolor('lightyellow')
plt.plot([1, 2, 3, 4], [1, 4, 2, 3])
plt.title('Yellow Plot Area Background')

##### Question 13
What's the difference between `plt.figure(facecolor='...')` and `ax.set_facecolor('...')`?
- **Answer:** `plt.figure(facecolor='...')` sets the background color of the entire figure, including the area around the plot. `ax.set_facecolor('...')` sets the background color of the plot area only.

##### Question 14
Why have two seperate functions for setting the background? Create a scenario in which that would be useful.
- **Answer:** Having two separate functions for setting the background is useful when you want to create contrast between the plot area and the surrounding figure area. For example, you might want a white plot area for clarity, but a colored background for the overall figure to make it stand out on a page or presentation slide.

##### Exercise 05
Now that we understand how these functions work, it's your turn!

Add a codeblock directly below this cell and write code to create your own plot that uses at least all the different Matplotlib functions we explored so far.
For each function you use, add a comment explaining what it does.

Your graph should be visually pleasing, use online resources for inspiration of color schemes.

### Part 2 - Matplotlib's Graphs
Now that we have the foundations of matplotlib down, let's explore the various types of graphs we can build.

To answer the questions, edit the markdown cell and put your answer below the question. 

**Make sure to save the markdown cell, by pressing the ✓ (check) icon in the top right after answering the questions**

#### 2.0 - Line Plot
Let's start with a simple line plot.

In [None]:
month = [1, 2, 3, 4, 5, 6]
sales = [2, 4, 5, 8, 9, 12]

plt.figure(figsize=(10, 6))
plt.plot(month, sales)
plt.title('Monthly Sales')
plt.xlabel('Month')
plt.ylabel('Sales (thousands)')
plt.grid(True)
plt.show()

##### Question 15
What is the general trend in this data? Is it going up, down, or staying the same?
- **Answer:** The general trend in this data is going up. The sales are increasing over the months.

##### Question 16 
Looking at the numbers on the y-axis, what was the lowest sales amount? What was the highest?
- **Answer:** Looking at the numbers on the y-axis, the lowest sales amount was 2 thousand, and the highest was 12 thousand.

##### Question 17
When would a line plot be useful? Pick one from these scenarios and explain why: <br>
a) Showing temperature changes throughout a day <br>
b) Comparing the number of students in different grades <br>
c) Showing favorite ice cream flavors in your class <br>
- **Answer:** A line plot would be most useful for scenario a) Showing temperature changes throughout a day. This is because temperature changes are continuous data that can be well represented by a line showing the trend over time.


#### 2.1 - Scatter Plot
Let's look at some data about students' study time and test scores:

In [None]:
study_hours = [1, 2, 2, 3, 4, 4, 5, 5, 6]
test_scores = [65, 70, 75, 80, 85, 85, 90, 95, 100]

plt.figure(figsize=(10, 6))
plt.scatter(study_hours, test_scores)
plt.title('Study Time vs. Test Scores')
plt.xlabel('Hours Studied')
plt.ylabel('Test Score')
plt.show()

##### Question 18
Compare the above code block to the line plot and examine what is different. Which function was used to create a scatter plot?
- **Answer:** The function used to create a scatter plot is `plt.scatter()`, as opposed to `plt.plot()` used for the line plot.

##### Question 19
What does each dot in this scatter plot represent?
- **Answer:** Each dot in this scatter plot represents a student, with their study hours on the x-axis and their test score on the y-axis.

##### Question 20
What happens to test scores as study hours increase?
- **Answer:** As study hours increase, test scores tend to increase as well, showing a positive correlation between study time and test performance.

##### Exercise 06
Add a code block directly below this one and create your own scatter plot about something that interests you. Some ideas:
- Hours of sleep vs. energy level
- Time spent practicing vs. points scored in a game
- Number of videos watched vs. understanding of a topic
Include at least 6 points and proper labels!

#### 2.2 - Bar Plot
Let's look at the favorite pets in a classroom:

In [None]:
categories = ['Dogs', 'Cats', 'Fish', 'Birds']
values = [12, 8, 3, 2]

plt.figure(figsize=(10, 6))
plt.bar(categories, values)
plt.title('Favorite Pets in Class')
plt.xlabel('Pet Type')
plt.ylabel('Number of Students')
plt.show()

##### Question 21
Which pet is the most popular? How many students chose it?
- **Answer:** Dogs are the most popular pet. 12 students chose dogs.

##### Question 22
How many students in total were surveyed?
- **Answer:** In total, 25 students were surveyed.


##### Exercise 07
Add a code block directly below this one and create your own bar plot showing how you spend your time after school. Include:
- At least 4 different activities
- The number of hours spent on each
- Clear labels saying what the plot shows

#### 2.3 - Histogram
Let's look at the distribution of heights in a class:

In [None]:
heights = [65, 68, 67, 67, 70, 65, 68, 69, 66, 71, 
           67, 68, 68, 69, 70, 67, 66, 67, 68, 69]

plt.figure(figsize=(10, 6))
plt.hist(heights, bins=6, edgecolor='black')
plt.title('Student Heights')
plt.xlabel('Height (inches)')
plt.ylabel('Number of Students')
plt.show()

##### Question 23
What height range appears most common in the class?
- **Answer:** The most common height range appears to be around 67-68 inches.

##### Question 24
About how many students are in the tallest group?
- **Answer:** About 6-7 students are in the tallest group.

##### Question 25
What does the `bins=6` argument do in the `plt.hist()` function? Try changing the value to see what happens
- **Answer:** The `bins=6` argument in the `plt.hist()` function determines the number of bars (or bins) in the histogram. Changing this value will adjust how the data is grouped and displayed.

##### Question 26
Why would you want different amount of bins in a histogram chart?
- **Answer:** Different numbers of bins in a histogram can be useful for revealing different patterns in the data. Fewer bins can show broad trends, while more bins can reveal finer details in the distribution.

### Part 3 - Conclusion
Congratulations! You've completed the Introduction to Jupyter Notebooks and the Matplotlib lab. You should now have a good understanding of how to use Jupyter Notebooks and create various types of plots using Matplotlib.

For each of the data scenario below, determine what graph would be best used to represent it and explain why

##### Question 27
Which graph would best suit showing how temperatures change during a week
- **Answer:** A line graph would be best suited for showing how temperatures change during a week. This allows for a clear visualization of the temperature trend over time.

##### Question 28
Which graph would best suit showing the number of students who chose different sports
- **Answer:** A bar graph would be best suited for showing the number of students who chose different sports. This allows for easy comparison between discrete categories.

##### Question 29
Which graph would best suit showing the relationship between time spent on social media and grades
- **Answer:** A scatter plot would be best suited for showing the relationship between time spent on social media and grades. This allows for visualization of potential correlation between two continuous variables.