# 📊 Introduction to Python for Data

## What is Data?

<b>Data</b> refers to a collection of facts, figures, or information. It is used to get a sample of information to understand a certain population. 

Some examples of collecting data include the following:
- Surveys
- Observations
- Experiments
- Photo and Video Collection

### Now that we understand data, now let's create a <span style="color:purple">variable</span> for our random data 🤡 of our own!

### 💡 **What is a <span style="color:purple">variable</span> ?**
> Variables are containers for storing data values. There are two types of variables:
>> **Numerical** = numbers (ex: height, test score) <br> **Categorical** = groups (ex: favorite color, type of pet)

In [50]:
### Here is a random list of test scores on a math test.

math_scores = [88, 92, 75, 85, 85, 78, 95, 83, 70, 87,
          88, 93, 76, 89, 88, 91, 77, 88, 94, 82]

## What is Statistics?

<p1><b>Statistics</b> is a branch of mathematics. In the real-life, we can use a group of numbers to help us <b>understand the world</b> around us 🌎. </p1>

Here is some questions we can answer (from micro-level 🔍 to macro-level 🔭) using the basic concepts of statistics:

- How many hours a day on average do I spend on my phone?
- Which sport do most middle schoolers play outside of gym class?
- Who is the most listened to artist globally in 2024?

<p1>When you're curious about how the world works, statistics gives you the tools to find real answers instead of just guessing.</p1>

## <span style="color:blue">Icebreaker (5 min):</span> Come up with your own "curiosity questions" that can be answered with statistics.

### Now that our brains are activited 🧠, lets apply some basic statistic concepts using a python  <span style="color:orange">library</span> 📚.

### 💡 **What is a <span style="color:orange">library</span> ?**
> A library is a collection of pre-written, reusable code that developers can use to perform specific tasks.
>> Example of a library: [NumPy](https://numpy.org/doc/2.1/reference/routines.statistics.html): a fundamental library for scientific computing in Python (aka a super powerful calculator ⚡🧮).

In [51]:
#In order to use a library, we need to import it like this!
import numpy as np

### Next, let's use <span style="color:green">functions</span> 🧪 from the NumPy  <span style="color:orange">library</span> 📚 to better understand our math_score <span style="color:purple">variable</span>.

### 💡 **What is a <span style="color:green">function</span> ?**
> A function is a resuable block of code. Example functions include the following:
>> <b>np.min():</b> finds the smallest number in a list. <br><b>np.max():</b> finds the biggest number in a list.

[ Input ]  →  [  Function Machine  ]  →  [ Output ]
<br>[3,7,1,4]  "Find the smallest number in the list"       1

In [55]:
#Tip: You can apply the "help" function on top of any function to better understand what it does!
help(np.min)

Help on function amin in module numpy:

amin(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
    Return the minimum of an array or minimum along an axis.
    
    Parameters
    ----------
    a : array_like
        Input data.
    axis : None or int or tuple of ints, optional
        Axis or axes along which to operate.  By default, flattened input is
        used.
    
        .. versionadded:: 1.7.0
    
        If this is a tuple of ints, the minimum is selected over multiple axes,
        instead of a single axis or all the axes as before.
    out : ndarray, optional
        Alternative output array in which to place the result.  Must
        be of the same shape and buffer length as the expected output.
        See :ref:`ufuncs-output-type` for more details.
    
    keepdims : bool, optional
        If this is set to True, the axes which are reduced are left
        in the result as dimensions with size one. With this option,
        the result

In [56]:
# Calculate mean
mean_score = np.mean(math_scores)

# Print the results
print(f"Mean (average): {mean_score}")

Mean (average): 85.2


In [58]:
#Your turn! Let's calculate the median using the np.median() function

#Calculate median

#Print the results


### ❗ Sometimes, you may need a couple <span style="color:green">functions</span> from a <span style="color:orange">library</span> to get exactly what you need.

In [68]:
# Calculate mode manually (most frequent number)
values, counts = np.unique(math_scores, return_counts=True)

table = np.column_stack((values, counts))
print("Frequency of scores:")
print(table)

mode_score = values[np.argmax(counts)]

print(f"\nMode (most frequent value): {mode_score}")

Frequency of scores:
[[70  1]
 [75  1]
 [76  1]
 [77  1]
 [78  1]
 [82  1]
 [83  1]
 [85  2]
 [87  1]
 [88  4]
 [89  1]
 [91  1]
 [92  1]
 [93  1]
 [94  1]
 [95  1]]

Mode (most frequent value): 88
