<H1>SI 618 Day 01 - Introduction</H1>
Dr. Chris Teplovs, University of Michigan School of Information

Copyright &copy; 2024.  This notebook may not be shared outside of the course without permission.

This notebook is a very brief introduction to Jupyter notebooks.  We will use Jupyter notebooks for all of our work in this course.

Notebook version 2024.01.10.4.CT

## Learning Objectives
By the end of this class, you should:
* confirm that you have a working Jupyter environment using Visual Studio Code (VS Code)
* be able to open and edit a Jupyter notebook
* be able to run a Jupyter notebook
* have written your first code in this class
* experimented with Copilot
* have successfully submitted an assignment to Canvas

You will be working in this notebook. When we are done, you will submit this notebook in two formats: HTML and IPYNB

Jupyter notebooks consist of two main types of "cells" or "blocks" (I use those terms interchangeably): code and markdown.  There are other types, but we won't be using them in this course.  Code blocks contain (python) code, whereas markdown blocks contain text.  Use markdown blocks to create richer narratives around your code.

For more information about what constitutes a good Jupyter notebook, please read Adam Rule, et al. [Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007007).

In this course, we also expect you to conform to the PEP-8 style guidelines:  [PEP 8 Style Guide for Python Code](https://pep8.org/). Adapted from the original Python Enhancement Proposal,  [PEP 8](https://peps.python.org/pep-0008/). You will earn fewer points for your assignments if you do not follow this style guide.

**Before you start**: Make sure you have selected your `.venv` Python environment.

One of the first things we want to do in our notebooks is our `import`s.  We'll import two packages (or libraries, or modules) that we'll use a bit later in the notebook:


In [3]:
import csv

### Challenge 1: Write code that prints out "Hello, world!" (don't overthink this one)

In [4]:
# insert your code here
print("Hello world!")

Hello world!


### Challenge 2: Sum of squares
Create a function that calculates the sum of squares of any sequence of numbers.  Test it on the following values: 1,3,5,7,9.  The answer should be 165.  We'll improve the `assert` statement during class, but for now, just use it as shown.

In [5]:
def sum_of_squares(seq):
    sum = 0
    for i in seq:
        sum = sum + i**2
    return sum
    

assert sum_of_squares([1,3,5,7,9]) == 165

### Challenge 3: Documentation
Add documentation to the following function using docstrings and comments.  (Hint: see https://en.wikipedia.org/wiki/Fibonacci_sequence or leverage GitHub Copilot Chat.)

In [6]:
# Febonacci sequence starts from 0,1, add up previous two number to get the next number 


def f(x):
    """ return the xth Fibonacci number """
    if x == 0:
        return 0
    elif x == 1:
        return 1
    else:
        return f(x-1) + f(x-2)

#### Challenge 4: String manipulation, documentation and type hints
Write a function that takes a string as input and returns the number of vowels in the string.  Test it on the following string: "The quick brown fox jumped over the lazy dog."  The answer should be 11.  Document your function.  Include [type hints (see PEP-484)](https://peps.python.org/pep-0484/).

In [7]:
def count_vowels(s):
    return sum([s.count(c) for c in "aeiou"])
    pass # replace this with your code

assert count_vowels("hello") == 2, f"There are two vowels in hello, not {count_vowels('hello')}"
assert count_vowels("The quick brown fox jumped over the lazy dog") == 12, f"There are 11 vowels in the quick brown fox jumped over the lazy dog, not {count_vowels('The quick brown fox jumped over the lazy dog')}"


### Challenge 5: Find the mode, write some tests.  
The mode is the most frequently occurring value.  In the case where multiple modes, you only need to print one of them.  Document and typehint your code. Test your code with an assert statement for the following values: 1, 2, 3, 3, 4, 5, 5, 5, 6, 6, 7, 8, 9, 9, 10.

Try to do this on your own without resorting to Googling for a solution.

In [19]:
from typing import List

def find_mode(numbers: List[int]) -> int:
    """
    Finds the mode of a list of numbers. 
    The mode is the most frequently occurring value.
    In the case of multiple modes, only one is returned.

    :param numbers: List of integers representing the dataset.
    :return: An integer representing the mode of the list.
    """
    frequency = {}  # Dictionary to store the frequency of each number
    max_count = 0   # Variable to keep track of the maximum frequency
    mode = None     # Variable to store the mode

    # Iterate over the list to count the frequency of each number
    for number in numbers:
        if number in frequency:
            frequency[number] += 1
        else:
            frequency[number] = 1

        # Update mode if the current number's frequency is higher
        if frequency[number] > max_count:
            max_count = frequency[number]
            mode = number

    return mode

assert find_mode([1, 2, 3, 3, 4, 5, 5, 5, 6, 6, 7, 8, 9, 9, 10]) == 5


### Challenge 6: Calculate the mean temperature for the data in aranet4.csv
Your output should consist of the following:

```Mean temperature: X.XX```

where `X.XX` is the mean temperature rounded to two decimal places.  For example: 27.23 or 4.32 or -1.20.

**NOTE: You should use only base python and the `csv` module to solve this problem.  Do not use pandas or any other libraries.**

The data file should be located in the `data` directory that is a sibling of the directory containing this notebook.  For example, if this notebook is located in `SI_618_WN_24_Files/inclass`, then the data file should be located in `SI_618_WN_24_Files/data`.

In [9]:
filename = '../data/aranet4.csv'

In [10]:
def aranet_mean_temp(filename):
    with open(filename, newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        temps = [float(row['Temperature(°C)'])for row in reader]
        return round(sum(temps) / len(temps), 2)

assert aranet_mean_temp(filename) == 17.75, "The mean temperature should be 17.75"

Just for fun, here's how to do the above challenge using pandas:

In [11]:
import pandas as pd
aranet_df = pd.read_csv(filename)
print(f"Mean temperature: {aranet_df['Temperature(°C)'].mean():.2f}")

Mean temperature: 17.75


In [12]:
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})



In [13]:
aranet_df.describe()

Unnamed: 0,Carbon dioxide(ppm),Temperature(°C),Relative humidity(%),Atmospheric pressure(hPa)
count,5773.0,5773.0,5773.0,5773.0
mean,630.763381,17.754062,24.391478,1000.93383
std,176.137812,1.803766,3.507078,9.760638
min,441.0,7.3,15.0,983.0
25%,510.0,16.7,22.0,992.0
50%,611.0,18.3,24.0,1002.0
75%,684.0,18.8,26.0,1009.0
max,2263.0,22.2,52.0,1017.0


In [14]:
df1.describe()

Unnamed: 0,a,b
count,3.0,3.0
mean,2.0,5.0
std,1.0,1.0
min,1.0,4.0
25%,1.5,4.5
50%,2.0,5.0
75%,2.5,5.5
max,3.0,6.0


After we review these challenges, we'll take a look at some basic pandas DataFrame functionality.


## END OF NOTEBOOK
Remember to submit this notebook to Canvas in both HTML and IPYNB formats.  Note: if you have difficulty exporting to HTML, just submit the IPYNB file and make sure you reach out to the teaching team to get help.  You will only receive partial credit if you do not submit both formats (HTML and IPYNB) by the due date (see Canvas for due date).