# Python Data Science Lecture 2

## CONDA Example

### Environment Management

#### Create an environment

conda create --name lecture_2 numpy python=2.7

#### Activate an environment

conda activate lecture_2

#### Export an environment

conda env export --file lecture_2_env.yml

#### Deactivate an environment

conda deactivate

#### Remove an environment

conda env remove --name lecture_2 

#### Create a new environment for an export

conda env create --file lecture_2_env.yml

#### List available environments

conda env list

### Package Management

#### Install libraries

conda install scipy

#### List packages that are available in current environment

conda list

#### Uninstall packages

conda uninstall scipy

#### Update packages

conda update numpy

## argparse Example

#### Code in example script argTest.py

def main():

    print args.var1

    if args.var3:
        print args.var3

if __name__ == "__main__":
    main()

#### How to use argparse

In [1]:
import argparse

#### Initialize the argument parser

In [4]:
parser = argparse.ArgumentParser()

#### Add arguments

In [5]:
parser.add_argument("var1", type=str, help="any string")

_StoreAction(option_strings=[], dest='var1', nargs=None, const=None, default=None, type=<class 'str'>, choices=None, help='any string', metavar=None)

#### Things to remember about argparse

Arguments are **positonal arguments by default** this means that the order of the arguments in the command line matter:

Adding "--" to an argument name makes it an **optional** argument

In [6]:
parser.add_argument("--var2", help="any string")

_StoreAction(option_strings=['--var2'], dest='var2', nargs=None, const=None, default=None, type=None, choices=None, help='any string', metavar=None)

You can make short version of optional arguments

In [7]:
parser.add_argument("-v3","--var3",type=str, help="any string")

_StoreAction(option_strings=['-v3', '--var3'], dest='var3', nargs=None, const=None, default=None, type=<class 'str'>, choices=None, help='any string', metavar=None)

Make the **args** object:

In [10]:
args = parser.parse_args()

usage: ipykernel_launcher.py [-h] [--var2 VAR2] [-v3 VAR3] var1
ipykernel_launcher.py: error: unrecognized arguments: -f


SystemExit: 2

#### Basics of the script:

#### Example 1:

In [None]:
python argTest.py  cat -v3 fish

cat
fish

#### Example 2:

python argTest.py -v3 cat fish

fish cat

#### Example 3:

python argTest.py cat -v3 dog --var2 fish

cat dog

## Jupyter Notebook

Jupyter notebook really handy for:

* Run pieces of a larger script
* Keep track of what you did (lab notebook)

#### How to install Jupyter notebook

conda install --channel conda-forge notebook

conda update notebook

conda install arrow

#### Open Jupyter notebook

jupyter notebook

#### Files tab

This is how you manage your notebooks
* Open existing notebooks
* Make new notebooks

#### Running tab

These are the currently active notebooks

#### In a notebook

code vs markdown cells

Checkpoints

Exporting a notebook (download as)

## Numpy Example

In [54]:
import numpy as np

predictions:

In [55]:
hatY = [1.5,2.6,2.8,3.9]

In [56]:
hatY = np.array(hatY)

actual data:

In [57]:
Y = [1,2,3,4]

In [58]:
Y = np.array(Y)

In [59]:
n = float(len(Y))

In [60]:
n

4.0

In [61]:
meanSquareError = (1/n) * np.sum(np.square(hatY - Y))

In [62]:
meanSquareError

0.16500000000000004

## Scipy Example

We can use scipy to perform statistical tests and perform probability calculations.

To start off let's sample from a binomial distribution. We want to sample a number of heads given a coin flip.

First a single coin flip:

In [63]:
from scipy.stats import binom

Number of flips

In [64]:
n = 1

Probability of a head

In [65]:
p = 0.5

In [68]:
r = binom.rvs(n, p, size=1)

In [69]:
r

array([1])

Now let's sample assuming we did 10 coin flips:

In [70]:
n =10

In [71]:
r = binom.rvs(n, p, size=1)

In [72]:
r

array([3])

Random Seeds in numpy

Setting a random seed allows you to have a reproducible selection from a random command

**NOTE:** Numpy has its own random seed system

In [73]:
probsArray = np.array([0.1,0.4,0.5])

In [78]:
sampleMultinomial = np.random.multinomial(1, probsArray, size=5)

In [79]:
sampleMultinomial

array([[0, 0, 1],
       [0, 0, 1],
       [0, 0, 1],
       [0, 0, 1],
       [0, 1, 0]])