## Conda

Creating and removing virtual environments with conda

In [None]:
conda create -n env_name python=3.6 pandas numpy
conda env remove -n env_name

Create 

In [None]:
conda create -n py3 python=3

Activating environment

In [None]:
source activate my_env

Exporting environment to share it with teammates 

In [None]:
conda env export > environment.yaml

Create environment from yaml file

In [None]:
conda env create -f environment.yaml

## Jupyter

Install Notebook Conda to help manage the environments in Jupyter

In [None]:
conda install nb_conda

Create slides in Jupyter and launch (in Slides mode):

In [None]:
jupyter nbconvert notebook.ipynb --to slides --post serve

## Pandas

### Reading a file

Read a file and replace column names with custom ones

In [None]:
labels = ['id', 'name', 'attendance', 'hw', 'test1', 'project1', 'test2', 'project2', 'final']
df = pd.read_csv('file_name.csv', header=0, names=labels)
df.head()

### Data frame summaries

Datatypes of all columns. Note: Pandas store **pointers** to strings

In [None]:
df.dtypes

Summary statistics

In [None]:
df.describe()

### Selecting columns

Select all the columns from 'id' to the last mean column

In [None]:
df_means = df.loc[:,'id':'fractal_dimension_mean']

OR

In [None]:
df_means = df.iloc[:,:11]

View the index number and label for each column

In [None]:
for i, v in enumerate(df.columns):
    print(i, v)

Convenient way to generate indices to select non-adjacent columns:

In [None]:
import numpy as np
indices = np.r_[:2, 12:22]
df_SE = df.iloc[:,indices]

### Imputation

Imputation

In [None]:
df['column_name'].fillna(new_value, inplace = True)

### Duplicates

Find duplicates  
(True means that this value is a duplicate. E.g. if there are two occurences, the first one will be labelled as False, the second one as True)

In [None]:
df.duplicated()

Remove duplicates

In [None]:
df.drop_duplicates(inplace=True)

### Datetime

In [None]:
df['column_name'] = pd.to_datetime(df['column_name'])

### Plots

Include to use plots in Jupyter notebook

In [None]:
%matplotlib inline

Histograms

In [None]:
df.hist()
df.hist(figsize=(8,8)); # specify figure size and suprsess outputing text 
df['column_name'].hist()
df.plot(kind='hist')

Bar charts

In [None]:
df['column_name'].value_counts().plot(kind='bar')

Pie charts

In [None]:
df['column_name'].value_counts().plot(kind='pie')

Relationships for all pairs and histograms for each variable

Simple scatter plot for two variables:

In [None]:
df.plot(x='column_name1',y='column_name2',kind='scatter')

Box plot

In [None]:
df['column_name'].plot(kind='box')