![data-x](https://raw.githubusercontent.com/afo/data-x-plaksha/master/imgsource/dx_logo.png) 

## Introduction to Data-X
Mostly basics about Anaconda, Git, Python, and Jupyter Notebooks

#### Author: Alexander Fred-Ojala

---


# Useful Links
1. Managing conda environments:
    - https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
3. Learning Python (resources):
    - https://www.datacamp.com/
    - [Python Bootcamp](https://bids.berkeley.edu/news/python-boot-camp-fall-2016-training-videos-available-online
)
4. Datahub: http://datahub.berkeley.edu/ (to run notebooks in the cloud)
5. Google Colab: https://colab.research.google.com (also running notebooks in the cloud)
5. Data-X website resources: https://data-x.blog
6. Book: [Hands on Machine Learning with Scikit-Learn and Tensorflow](https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291/ref=sr_1_1?ie=UTF8&qid=1516300239&sr=8-1&keywords=hands+on+machine+learning+with+scikitlearn+and+tensorflow)

# Introduction to Jupyter Notebooks

From the [Project Jupyter Website](https://jupyter.org/):

* *__Project Jupyter__ exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages. Collaborative, Reproducible.*

* *__The Jupyter Notebook__ is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.*

# Notebook contains 2 cell types Markdown & Code

###### Markdown cells

Where you write text.

Or, equations in Latex: $erf(x) = \frac{1}{\sqrt\pi}\int_{-x}^x e^{-t^2} dt$

Centered Latex Matrices:

$$
\begin{bmatrix}
    x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
    x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{d1} & x_{d2} & x_{d3} & \dots  & x_{dn}
\end{bmatrix} 
$$

<div class='alert alert-warning'>Bootstrap CSS and `HTML`</div>

Python (or any other programming language) Code
```python
# simple adder function
def adder(x,y):
    return x+y
```

# Header 1
## Header 2
### Header 3...

**bold**, *italic*

Divider

_____

* Bullet
* Lists


1. Enumerated
2. Lists

Useful images:
![](https://image.slidesharecdn.com/juan-rodriguez-ucberkeley-120331003737-phpapp02/95/juanrodriguezuc-berkeley-3-728.jpg?cb=1333154305)

<img src='https://image.slidesharecdn.com/juan-rodriguez-ucberkeley-120331003737-phpapp02/95/juanrodriguezuc-berkeley-3-728.jpg?cb=1333154305' width='200px'>

---

An internal (HTML) link to section in the notebook:


## <a href='#bottom'>Link: Take me to the bottom of the notebook</a>

___

## **Find a lot of useful Markdown commands here:** 
### https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet

___

# Code Cells
In them you can interactively run Python commands

In [None]:
print('hello world!')
print('2nd row')

In [None]:
# Comment in a code cells

In [None]:
# Lines evaluated sequentially
# A cell displays output of last line
2+2
3+3
5+5

In [None]:
# Stuck in an infinite loop
while True:
    continue

In [None]:
# Cells evaluated sequentially

In [None]:
tmp_str = 'this is now stored in memory'
print(tmp_str)

In [None]:
print("Let's Start Over")

In [None]:
print(tmp_str)

## Jupyter / Ipython Magic

In [None]:
# Magic commands (only for Jupyter and IPython, won't work in script)
%ls

In [None]:
# Time several runs of same operation
%timeit [i for i in range(1000)];

In [None]:
# Time operation
%time 
[x for x in range(1000)];

In [None]:
%ls resources/

In [None]:
# %load resources/print_hw3.py
def print_hw(x):
    for i in range(int(x)):
        print(str(i)+' hello python script!')

print_hw(3)


In [None]:
%lsmagic

In [None]:
?%alias

In [None]:
?str

## Terminal / Command Prompt commands

In [None]:
# Shell commands
!cat resources/random.txt

In [None]:
!ls # in mac

In [None]:
!dir #in windows

In [None]:
# show first lines of a data file
!head -n 1 resources/sample_data.csv

In [None]:
# count rows of a data file
!wc resources/sample_data.csv

# Useful tips (Keyboard shortcuts etc):
4. Enter selection mode / Cell mode (Esc / Return)
1. Insert cells (press A or B in selection mode)

2. Delete / Cut cells (press X in selection mode)
3. Mark several cells (Shift in selection mode)
6. Merge cells (Select, then Shift+M)

# Printing to pdf 
### (USEFUL FOR HOMEWORKS)
**Easiest**: File -> Print Preview. 
Then save that page as a PDF (Ctrl + P, Save as PDF usually works).

**Pro:** Install a Latex compiler. Then: File -> Download As -> PDF.

# Quick Review of Python Topics

### Check what Python distribution you are running

In [None]:
!which python #works on unix system, maybe not Windows

In [None]:
# Check that it is Python 3
import sys # import built in package
print(sys.version)

## Python as a calculator

In [None]:
# Addition
2.1 + 2

In [None]:
# Mult
10*10.0

In [None]:
# Floor division
7//3

In [None]:
# Floating point division, note py2 difference
7/3

In [None]:
type(2)

In [None]:
type(2.0)

In [None]:
a = 3
b = 5
print (b**a) # ** is exponentiation

In [None]:
print (b%a)  # modulus operator = remainder

In [None]:
type(5) == type(5.0)

In [None]:
# boolean checks
a = True
b = False
print (a and b)

In [None]:
# conditional programming
if 5 == 5:
    print('correct!')
else:
    print('what??')

In [None]:
print (isinstance(1,int))

## String slicing and indices
<img src="resources/spam.png" width="480">

In [None]:
# Strings and slicing
x = "abcdefghijklmnopqrstuvwxyz"

In [None]:
print(x)

In [None]:
print(x[1]) # zero indexed

In [None]:
print (type(x))

In [None]:
print (len(x))

In [None]:
print(x)

In [None]:
print (x[1:6:2]) # start:stop:step

In [None]:
print (x[::3])

In [None]:
print (x[::-1])

### Manipulating text

In [None]:
# Triple quotes are useful for multiple line strings
y = '''The quick brown 
fox jumped over 
the lazy dog.'''
print (y)

### String operators and methods

In [None]:
# tokenize by space
words = y.split(' ')
print (words)

In [None]:
# remove break line character
[w.replace('\n','') for w in words]

<div class='alert alert-success'>TAB COMPLETION TIPS</div>

In [None]:
words.append('last words')

In [None]:
import pandas as pd

In [None]:
words

In [None]:
words.

In [None]:
str()

# Data Structures

## **Tuple:** Sequence of Python objects. Immutable.

In [None]:
t = ('a','b', 3)
print (t) 
print (type (t))
t[1]

In [None]:
t[1] = 2 #error

## **List:** Sequence of Python objects. Mutable

In [None]:
y = list() # create empty list
type(y)

In [None]:
type([])

In [None]:
# Append to list
y.append('hello')
y.append('world')
print(y)

In [None]:
y.pop(1)

In [None]:
print(y)

In [None]:
# List addition (merge)
y + ['data-x']

In [None]:
# List multiplication
y*4

In [None]:
# list of numbers
even_nbrs = list(range(0,20,2)) # range has lazy evaluation
print (even_nbrs)

In [None]:
# supports objects of different data types
z = [1,4,'c',4, 2, 6]
print (z)

In [None]:
# list length (number of elements)
print(len(z))

In [None]:
# it's easy to know if an element is in a list
print ('c' in z)

In [None]:
print (z[2])  # print element at index 2

In [None]:
# traverse / loop over all elements in a list
for i in z:
    print (i)

In [None]:
# lists can be sorted, 
# but not with different data types
z.sort()

In [None]:
#z.sort() # doesn't work
z.pop(2)

In [None]:
z

In [None]:
z.sort() # now it works!
z

In [None]:
print (z.count(4))  # how many times is there a 4

In [None]:
# loop examples
for x in z:
    print ("this item is ", x)

In [None]:
# print with index
for i,x in enumerate(z):
    print ("item at index ", i," is ",  x )

In [None]:
# print all even numbers up to an integer
for i in range(0,10,2):
    print (i)

In [None]:
# list comprehesion is like f(x) for x as an element of Set X
# S = {x² : x in {0 ... 9}}
S = [x**2 for x in range(10)]
print (S)

In [None]:
# All even elements from S
# M = {x | x in S and x even}
M = [x for x in S if x % 2 == 0]
print (M)

In [None]:
# Matrix representation with Lists
print([[1,2,3],[4,5,6]]) # 2 x 3 matrix

# Sets (collection of unique elements)

In [None]:
# a set is not ordered
a = set([1, 2, 3, 3, 3, 4, 5,'a'])
print (a)

In [None]:
b = set('abaacdef')
print (b) # not ordered

In [None]:
print (a|b) # union of a and b

In [None]:
print(a&b) # intersection of a and b

In [None]:
a.remove(5)
print (a) # removes the '5'

# Dictionaries: Key Value pairs
Almost like JSON data

In [None]:
# Dictionaries, many ways to create them
# First way to create a dictionary is just to assign it
D1 = {'f1': 10, 'f2': 20, 'f3':25}              

In [None]:
D1

In [None]:
D1['f2']

In [None]:
# 2. creating a dictionary using the dict()
D2 = dict(f1=10, f2=20, f3 = 30)
print (D2['f3'])

In [None]:
# 3. Another way, start with empty dictionary
D3 = {}
D3['f1'] = 10
D3['f2'] = 20
print (D3['f1'])

In [None]:
# Dictionaries can be more complex, ie dictionary of dictionaries or of tuples, etc.
D5 = {}
D5['a'] = D1
D5['b'] = D2
print (D5['a']['f3'])

In [None]:
D5

In [None]:
# traversing by key
# key is imutable, key can be number or string
for k in D1.keys():
    print (k)

In [None]:
# traversing by values
for v in D1.values(): 
    print(v)

In [None]:
# traverse by key and value is called item
for k, v in D1.items():                # tuples with keys and values
    print (k,v)

# User input

In [None]:
# input
# raw_input() was renamed to input() in Python v3.x
# The old input() is gone, but you can emulate it with eval(input())

print ("Input a number:")
s = input()  # returns a string
a = int(s)
print ("The number is ", a)

# Import packages

In [None]:
import numpy as np

In [None]:
np.subtract(3,1)

# Functions

In [None]:
def adder(x,y):
    s = x+y
    return(s)

In [None]:
adder(2,3)

# Classes

In [None]:
class Holiday():
    def __init__(self,holiday='Holidays'):
        self.base = 'Happy {}!'
        self.greeting = self.base.format(holiday)
    
    def greet(self):
        print(self.greeting)
        
easter = Holiday('Easter')
hanukkah = Holiday('Hanukkah')

In [None]:
easter.greeting

In [None]:
hanukkah.greet()

In [None]:
# extend class

class Holiday_update(Holiday):
    
    def update_greeting(self, new_holiday):
        self.greeting = self.base.format(new_holiday)

In [None]:
hhg = Holiday_update('July 4th')

In [None]:
hhg.greet()

In [None]:
hhg.update_greeting('Labor day / End of Burning Man')
hhg.greet()

<div id='bottom'></div>