# <center>LECTURE OVERVIEW</center>

---

## By the end of the day you'll be able to:
- write conditional statements using boolean logic
- import modules/packages
- describe why the `pandas` module is useful
- create a dataframe and summarize it

# <center>BOOLEAN LOGIC</center>

---

## <font color='LIGHTGRAY'>By the end of the day you'll be able to:</font>
- **write conditional statements using boolean logic**
- <font color='LIGHTGRAY'>import modules/packages</font>
- <font color='LIGHTGRAY'>describe why the pandas module is useful</font>
- <font color='LIGHTGRAY'>create a dataframe and summarize it</font>
    
|Operator	  |What it means	             |What it looks like|
|-------------|----------------------------|------------------|
|`and`        |True if both are true       |`x and y`           |
|`or`         |True if at least one is true|`x or y`            |
|`not`        |True only if false          |`not x`             |

# AND

In [1]:
print(True and True)
print(True and False)
print(False and True)
print(False and False)

True
False
False
False


# OR

In [None]:
print(True or True)
print(True or False)
print(False or True)
print(False or False)

# NOT

In [None]:
print(not True)
print(not False)

# Some more exciting examples...

In [2]:
print((5 > 3) and (5 < 9))
print((5 > 3) and (5 > 9))
print((5 < 3) and (5 < 9))
print((5 < 3) and (5 > 9))

True
False
False
False


In [3]:
print(
    not ((-0.2 > 1.4) and ((0.8 < 3.1) or (0.1 == 0.1)))
)

True


In [4]:
age = 59
print((type(age) != float) or (type(age) != int))
print((type(age) != float) and (type(age) != int))

True
False


# <center>MODULES</center>

---

## <font color='LIGHTGRAY'>By the end of the day you'll be able to:</font>
- <font color='LIGHTGRAY'>write conditional statements using boolean logic</font>
- **import modules/packages**
- <font color='LIGHTGRAY'>describe why the pandas module is useful</font>
- <font color='LIGHTGRAY'>create a dataframe and summarize it</font>

What are modules/packages?
- libraries of code
- specific to tasks/functions
- a lot of common functions are already written by computer scientists and are much faster than you can write
- we will be using packages in addition to base Python in the next two weeks

In [6]:
# how to get mean of `nums_lst`?
nums_list = [1, 2, 3, 4, 5, 10, 20, 50, 200]

## <center> Let's google it!

In [7]:
import statistics

print(statistics.mean(nums_list))

32.77777777777778


In [8]:
help(statistics.mean)

Help on function mean in module statistics:

mean(data)
    Return the sample arithmetic mean of data.
    
    >>> mean([1, 2, 3, 4, 4])
    2.8
    
    >>> from fractions import Fraction as F
    >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
    Fraction(13, 21)
    
    >>> from decimal import Decimal as D
    >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
    Decimal('0.5625')
    
    If ``data`` is empty, StatisticsError will be raised.



In [10]:
from statistics import mean

print(mean(nums_list))

32.77777777777778


In [11]:
import numpy as np

print(np.mean(nums_list))

32.77777777777778


In [12]:
help(np.mean)

Help on function mean in module numpy:

mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>)
    Compute the arithmetic mean along the specified axis.
    
    Returns the average of the array elements.  The average is taken over
    the flattened array by default, otherwise over the specified axis.
    `float64` intermediate and return values are used for integer inputs.
    
    Parameters
    ----------
    a : array_like
        Array containing numbers whose mean is desired. If `a` is not an
        array, a conversion is attempted.
    axis : None or int or tuple of ints, optional
        Axis or axes along which the means are computed. The default is to
        compute the mean of the flattened array.
    
        .. versionadded:: 1.7.0
    
        If this is a tuple of ints, a mean is performed over multiple axes,
        instead of a single axis or all the axes as before.
    dtype : data-type, optional
        Type to use in computing the mean.

### **<font color='GREEN'> Exercise</font>**

Google the standard deviation function in the `statistics` and `numpy` python packages. Import the packages and then use the functions on `nums_list`.

In [13]:
# TODO: insert solution here

# The `pandas` Module

![](data/panda.jpg)

In [1]:
import pandas as pd

## Why `pandas`?

## <font color='LIGHTGRAY'>By the end of the day you'll be able to:</font>
- <font color='LIGHTGRAY'>write conditional statements using boolean logic</font>
- <font color='LIGHTGRAY'>import modules/packages</font>
- **describe why the `pandas` module is useful**
- <font color='LIGHTGRAY'>create a dataframe and summarize it</font>

Work with tabular data with mixed types, like an excel sheet

Week 4, we will work with `pandas` more indepth

## The `DataFrame` Container Type

- Part of Pandas package
- Spreadsheet or table-like representation of data
- Can store mixed types
- Columns and rows are named
- Like a nested list, where all the sublists have the same shape (basically a matrix)
- Lots of functions for cleaning and massaging data, grouping, aggregations, plotting
- Exceptionally popular

## Creating a `DataFrame`

## <font color='LIGHTGRAY'>By the end of the day you'll be able to:</font>
- <font color='LIGHTGRAY'>write conditional statements using boolean logic</font>
- <font color='LIGHTGRAY'>import modules/packages</font>
- <font color='LIGHTGRAY'>describe why the pandas module is useful</font>
- **create a `DataFrame` and summarize it**

In [2]:
names_list = ['Ashley', 'Andras', 'Rihanna', 'Emily']
ages_list = [30, 36, 28, 33]
birthplaces_list = ['USA', 'Hungary', 'Barbados', 'USA']
singers_list = [False, False, True, False]
people_dict = {
    "name": names_list,
    "age": ages_list,
    "birthplace": birthplaces_list,
    "is_singer": singers_list
}

In [3]:
people_df = pd.DataFrame(people_dict)
people_df

Unnamed: 0,name,age,birthplace,is_singer
0,Ashley,30,USA,False
1,Andras,36,Hungary,False
2,Rihanna,28,Barbados,True
3,Emily,33,USA,False


In [4]:
people_df.shape

(4, 4)

In [5]:
people_df.columns

Index(['name', 'age', 'birthplace', 'is_singer'], dtype='object')

In [6]:
people_df.dtypes

name          object
age            int64
birthplace    object
is_singer       bool
dtype: object

# Conclusion

## You are now able to:
- write conditional statements using boolean logic
- import modules/packages
- describe why the `pandas` module is useful
- create a dataframe and summarize it