# Foundations of Data Science: Preparing the data 
- Cource intro 
- Intro to data science 
- Python DS libraries 


# Course Intro


## Python libraries 
- python has some built-in libraries
- to check what is currently installed use pip freeze 
- If you don't have specific library, then you need to install using pip install e.g. for pandas 
- Installing is a one-time step, but importing a library needs to be done every time you need it 

### Note, you can import full modules or submodules. 
- importing submodules makes it easier to type because you don't have to type datetime.module.method() each time since you imported the module you need skipping all of that

In [1]:
pip freeze

asttokens==2.4.1
attrs==23.2.0
beautifulsoup4==4.12.3
bleach==6.1.0
colorama==0.4.6
comm==0.2.1
debugpy==1.8.0
decorator==5.1.1
defusedxml==0.7.1
executing==2.0.1
fastjsonschema==2.19.1
ipykernel==6.28.0
ipython==8.20.0
jedi==0.19.1
Jinja2==3.1.3
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyterlab_pygments==0.3.0
MarkupSafe==2.1.4
matplotlib-inline==0.1.6
mistune==3.0.2
nbclient==0.9.0
nbconvert==7.14.2
nbformat==5.9.2
nest-asyncio==1.5.8
packaging==23.2
pandocfilters==1.5.1
parso==0.8.3
platformdirs==4.1.0
prompt-toolkit==3.0.43
psutil==5.9.7
pure-eval==0.2.2
Pygments==2.17.2
PyPDF2==3.0.1
python-dateutil==2.8.2
pywin32==306
pyzmq==25.1.2
referencing==0.33.0
rpds-py==0.17.1
six==1.16.0
soupsieve==2.5
stack-data==0.6.3
tinycss2==1.2.1
tornado==6.4
traitlets==5.14.1
wcwidth==0.2.13
webencodings==0.5.1
Note: you may need to restart the kernel to use updated packages.


## Working with date and time in python
cheatsheet for formatting date & time: https://strftime.org/


In [12]:
from datetime import date

In [22]:
today_dt = date.today()
#Gives date in international format 
print(today_dt) 

#extract year and month from the date
print(today_dt.year)
print(today_dt.month)
print(today_dt.day)
#Extract date data type
print(type(today_dt))

2024-02-03
2024
2
3
<class 'datetime.date'>


In [18]:
#Convert date to a different format 
print(today_dt.strftime('%m-%d-%Y')) #USA format 
print(today_dt.strftime('%B')) #Full month name 

#Can use slashes instead of dashes as well 
print(today_dt.strftime('%m/%d/%Y')) #USA format 


02-03-2024
February
02/03/2024


In [23]:
from datetime import time

In [28]:
#9:30:10
t = time(21,30,10)
print(t)
print(t.hour)
print(t.minute)
print(t.second)


21:30:10
21
30
10


In [30]:
import datetime
dt = datetime.datetime.now() #datetime is the class, but they also named the subclass datetime
print(dt)

2024-02-03 08:43:02.567487


## Time delta

In [33]:
#Add 30 days to a pre-defined date 
#first, we need to define the delta in datetime format 
delta = datetime.timedelta(days=30)
print(delta)

#Then, we can add the delta to the date 
print(dt + delta)

30 days, 0:00:00
2024-03-04 08:43:02.567487


In [35]:
#we can add several components to the delta to get as accurate as you want 
delta = datetime.timedelta(days=20,hours=10,minutes=5)
print(dt)
print(delta)
print(dt+delta)

2024-02-03 08:43:02.567487
20 days, 10:05:00
2024-02-23 18:48:02.567487


## Difference between two dates

In [39]:
date1 = datetime.date(2020,1,1)
date2 = datetime.date.today()

diff = date2 - date1 
print(diff)

1494 days, 0:00:00


In [42]:
#Change timezone 
from datetime import datetime, timedelta
from pytz import timezone 
import pytz 
utc = pytz.utc
utc.zone

'UTC'

## Python packages for data science 
- numpy: the foundation of all of the libraries we're going to go over. First library developed for numerical computation, basis for linear algera, and uses arrays (which are even faster than lists and you can modify them (unlike tuples)).  
- SciPy
- Matplotlib 


## Intro to SciPy and NumPy
- run pip install first (only need to do this the first time)
- restart the kernal (only need to do this the first time)
- run the import statement next to load it it (need to do this everytime)

In [2]:
pip install numpy


Collecting numpy
  Obtaining dependency information for numpy from https://files.pythonhosted.org/packages/ad/11/52fbe97fd84c91105b651d25a122f8deed6d3519afb14f9771fac1c9b7de/numpy-1.26.3-cp312-cp312-win_amd64.whl.metadata
  Downloading numpy-1.26.3-cp312-cp312-win_amd64.whl.metadata (61 kB)
     ---------------------------------------- 0.0/61.2 kB ? eta -:--:--
     ------------------- ------------------ 30.7/61.2 kB 435.7 kB/s eta 0:00:01
     -------------------------------------- 61.2/61.2 kB 809.0 kB/s eta 0:00:00
Downloading numpy-1.26.3-cp312-cp312-win_amd64.whl (15.5 MB)
   ---------------------------------------- 0.0/15.5 MB ? eta -:--:--
   ---------------------------------------- 0.1/15.5 MB 3.6 MB/s eta 0:00:05
    --------------------------------------- 0.2/15.5 MB 3.5 MB/s eta 0:00:05
    --------------------------------------- 0.4/15.5 MB 2.8 MB/s eta 0:00:06
   - -------------------------------------- 0.6/15.5 MB 3.1 MB/s eta 0:00:05
   - --------------------------------


[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
pip install scipy

Collecting scipyNote: you may need to restart the kernel to use updated packages.

  Obtaining dependency information for scipy from https://files.pythonhosted.org/packages/f3/31/91a2a3c5eb85d2bfa86d7c98f2df5d77dcdefb3d80ca9f9037ad04393acf/scipy-1.12.0-cp312-cp312-win_amd64.whl.metadata
  Downloading scipy-1.12.0-cp312-cp312-win_amd64.whl.metadata (60 kB)
     ---------------------------------------- 0.0/60.4 kB ? eta -:--:--
     ------------ ------------------------- 20.5/60.4 kB 640.0 kB/s eta 0:00:01
     ------------------------- ------------ 41.0/60.4 kB 487.6 kB/s eta 0:00:01
     ------------------------- ------------ 41.0/60.4 kB 487.6 kB/s eta 0:00:01
     -------------------------------------- 60.4/60.4 kB 356.1 kB/s eta 0:00:00
Downloading scipy-1.12.0-cp312-cp312-win_amd64.whl (45.8 MB)
   ---------------------------------------- 0.0/45.8 MB ? eta -:--:--
   ---------------------------------------- 0.0/45.8 MB 2.0 MB/s eta 0:00:23
   ---------------------------------------


[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [4]:
import numpy, scipy

In [6]:
#get a comprehensive list of constants 
from scipy import constants
dir(constants)

['Avogadro',
 'Boltzmann',
 'Btu',
 'Btu_IT',
 'Btu_th',
 'G',
 'Julian_year',
 'N_A',
 'Planck',
 'R',
 'Rydberg',
 'Stefan_Boltzmann',
 'Wien',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_codata',
 '_constants',
 '_obsolete_constants',
 'acre',
 'alpha',
 'angstrom',
 'arcmin',
 'arcminute',
 'arcsec',
 'arcsecond',
 'astronomical_unit',
 'atm',
 'atmosphere',
 'atomic_mass',
 'atto',
 'au',
 'bar',
 'barrel',
 'bbl',
 'blob',
 'c',
 'calorie',
 'calorie_IT',
 'calorie_th',
 'carat',
 'centi',
 'codata',
 'constants',
 'convert_temperature',
 'day',
 'deci',
 'degree',
 'degree_Fahrenheit',
 'deka',
 'dyn',
 'dyne',
 'e',
 'eV',
 'electron_mass',
 'electron_volt',
 'elementary_charge',
 'epsilon_0',
 'erg',
 'exa',
 'exbi',
 'femto',
 'fermi',
 'find',
 'fine_structure',
 'fluid_ounce',
 'fluid_ounce_US',
 'fluid_ounce_imp',
 'foot',
 'g',
 'gallon',
 'gallon_US',
 'gallon_imp',
 'gas_co

In [12]:
print(constants.pi)
print(constants.Avogadro)
print(constants.metric_ton)
print("load size in tons:", 50*constants.metric_ton)

3.141592653589793
6.02214076e+23
1000.0
load size in tons: 50000.0


### trigonometry

In [14]:
import numpy as np #np is now an alias or shorthand 
#calculate the sine of 45 degree angle 
np.sin(45*constants.degree)

0.7071067811865476

### Stat functions in scipy

In [17]:
from numpy import mean 
myList = [2,3,6,7,8,9,0]
print(mean(myList))

5.0


#### Get the derivative of an equation using scipy
derive the following following: $$ 3x^2 + 9x - 5
for a y=2

### results: 6x + 9 



In [25]:
from scipy.misc import derivative as drv 

def eqn(x): 
    return x * x **2 + 9*x - 5 

In [30]:
drv(eqn,2)

  drv(eqn,2)


22.0