<a href="https://colab.research.google.com/github/adampick99/study-notes/blob/main/AppliedDataScienceInPython_Notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Applied Data Science in Python - University of Michigan

Start: 17/06/2023 14:20

## Week 1 - Fundamentals of Data Manipulation with Python

This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques such as l**ambdas, reading and manipulating csv files, and the numpy library**. The course will introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the a**bstraction of the Series and
DataFrame as the central data structures for data analysis**, along with tutorials on how to use functions such as** groupby, merge, and pivot tables** effectively. By the end of this course, students **will be able to take tabular data, clean it, manipulate it, and run basic inferential statistical analyses.

## Python Functions

You can create a function with optional parameters by defining the input as var = None. This declaration must happen AFTER the required variables in the def.

In [None]:
def add_numbers(x, y, z=None): # This can add a maximum of 3 numbers, and a minimum of 2.
  if z == None:
    return x+y
  else:
    return x+y+z

In [None]:
add_numbers(2,4)

6

The implication of assigning a variable z = None, it means that this is the default value used if this parameter isn't passed through the function. It can be manually chosen as None or you can simply not enter any value.

The type() function can be used to check the data type of a python object. Tuples, lists and dictionaries are what we care about.

A tuple is a sequence of variables which is immutable (can't be changed after declaring).

Tuples are declared by using ( ), lists by [ ], and dictionaries by { }, for example: {key: value, key2: value}.

key, key2 can be strings, float, integers, you name it (I think).

## Manipulating Strings
### Slicing

In Python, the indexing operator [0, 5] will give 5 different outputs, but the value in index [5] won't be output, it's exclusive, not inclusive.

This is the same for going backwards. [-5:-3] will output the 5th last and 4th last values, but not the 3rd last as it's exclusive.

[:3] is everything up to the 3rd index, without the 3rd index variable.

[3:] is everything from and including the 3rd index value (4th value as indexing begins at 0).

### Split - Regular Expression Evaluation
(Split strings based on substrings)



In [None]:
firstname = 'Christopher'
lastname = 'Brooks'

print(firstname + ' ' + lastname)
print(firstname * 3)
print('Chris' in firstname) # The in operator can be used to search in a string, and here we're searching for 'Chris'

Christopher Brooks
ChristopherChristopherChristopher
True


In [None]:
firstname = 'Christopher Arthur Hansen Brooks'.split(' ')[0]  # [0] selects the first element of the list
lastname = 'Christopher Arthur Hansen Brooks'.split(' ')[-1]  # [-1] selects the last element of the list
print(firstname)
print(lastname)

Christopher
Brooks


Make sure to convert objects to strings before concatenating using the + 'string' method.

## Dictionaries

Dictionaries are objects that store keys and values. They are defined by using curly brackets { }. You can use the **.values()** method to output just the values, not the keys. To output just the keys, use **.keys()**. To output both, use **.items()**.

You retrieve the associated value assigned to a label/key by simply indexing at the label/key value. So if dict = {'Jeff': 21} is the dictionary, then dict['Jeff'] returns 21.

In [None]:
x = {'Christopher Brooks': 'brooksch@umich.edu', 'Bill Gates': 'billg@microsoft.com'}

for key, value in x.items():
    print(key)
    print(value)

# This will output each key, followed by it's corresponding value.

Christopher Brooks
brooksch@umich.edu
Bill Gates
billg@microsoft.com


## Tuples

You can create tuples (not dictionaries) from already created variables. Remember, these are immutable. See below:

In [1]:
# Say we have a list
list = ('Tyrion', 'Lannister', 'Targaryen')

# We turn this list into a tuple by declaring the tuple key name and feeding the list in.
fore, sur, house = list

## More on String Manipulation

We can use the format() function to fill in string templates using other objects. Take a look:



In [None]:
sales_record = {
    'price': 3.24,
    'num_items': 4,
    'person': 'Chris'}

sales_statement = '{} bought {} item(s) at a price of {} each for a total of {}'

# The format function works on a template string to replace sections of the string with values/strings from another Python object (sales_record here)
print(sales_statement.format(sales_record['person'],
                             sales_record['num_items'],
                             sales_record['price'],
                             sales_record['num_items'] * sales_record['price']))

Chris bought 4 item(s) at a price of 3.24 each for a total of 12.96


## Reading and Writing CSV Files

Let's import our datafile mpg.csv, which contains fuel economy data for 234 cars.

mpg : miles per gallon
class : car classification
cty : city mpg
cyl : # of cylinders
displ : engine displacement in liters
drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd
fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)
hwy : highway mpg
manufacturer : automobile manufacturer
model : model of car
trans : type of transmission
year : model year

In [None]:
import csv

% precision 2

with open('datasets/mpg.csv') as csvfile:
    mpg = list(csv.DictReader(csvfile))

mpg[:3]  # The first three dictionaries in our list.

**csv.Dictreader has read in each row of our csv file as a dictionary.** len shows that our list is comprised of 234 dictionaries. The column i becomes the key for the i'th row value.

## Advanced Python Objects

### Object Oriented Programming

Python Documentation will explain this much better, but OOP isn't really worth studying yet for Data Science.

### Maps - map()

The map function is one of the bases for functional programming in Python.

map(function, iterable, ...)

This function allows you to apply a function over each iterate of the iteration.

The map function returns a map object, not just the output values, so it's not necessarily outputting what you want to see immediately - you can iterate over the map object to find each value however. It's commonly used when working with Big Data because it has good memory management.


## List Comprehensions

List comprehension offers a shorter syntax when you want to create a new list based on the values of an existing list.

Example:

Based on a list of fruits, you want a new list, containing only the fruits with the letter "a" in the name.

Without list comprehension you will have to write a for statement with a conditional test inside:

In [None]:
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlist = []

for x in fruits:
  if "a" in x:
    newlist.append(x)

print(newlist)

With list comprehension you can do all that with only one line of code:

In [None]:
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]

newlist = [x for x in fruits if "a" in x] # The first bit is the value output into the list
# Second bit after 'for' is what you iterate over
# And you can add conditions
# You can nested for loop using multiple fors. i.e by using for x in fruits1 for x in fruits2 to
# iterate over each fruit2 for each fruit1 etc.

print(newlist)