### Essential Python Libraries
- **NumPy**, short for numerical Python. It provides the data structures and algorithms needd for most scientific applications involving numerical data in Python. 
    - NumPy contains: 
        - A fast and efficient multidimensional array object *ndarray*
        - Functions for performing element-wise computations with arrays or mathematical operations between arrays. 
        - Tools for reading and writing array-based datasets to disk
        - Linear algebra operations
    - Beyond the fast array-processing capabilities that NumPy adds to Python, one of its primary uses in data analysis is as a contained for data to be passed between algorithms and libraries. 
       - For numerical data, NumPy arrays are efficient for storing and manipulating data than the other built in Python data structures. Thus many numerical computing tools for Python either assume NumPy arrays as a primary data structure or they target seamless interoperability with NumPy. 
       
- **pandas** provides high-level data structures and functions designed to make working with structured or tabular data fast, easy, and expressive. The primary objects in pandas are the `DataFrame`, a tabular column-oriented data structure with both row and column labels, and the `Series`, a one dimensional labeled array object. 
    - pandas blends the high-performance, array-computing ideas of NumPy with the flexible data manipulation capabilities of spreadsheets and relational databases.
        - It provides sophisticated insexing functionality to make it easy to reshape, slice and dice, perform aggregations, and select subsets of data.
    - Note, as a result of having been built initially to solve finance and business analytics problems, pandas features especially deep time series functionality and tools well suited for working with time-indexed data generated by business processes. 
    
- **matplotlib** is the most popular Python library for producing plots and other two-dimensional data visualizations. 

- **SciPy** is a collection of packages addressing a number of different standard problem domains in scientific computing. 
    - Here are some of the packages included:
        - `scipy.integrate` included numerical integaration routines and differential equation solvers. 
        - `scipy.linalg` includes linear algebra routines and matrix decompositions extending beyond those provided in `numpy.linalg`.
        - `scipy.optimize` includes function optimizers (minimizeers) and root finding algorithms. 
        - `scipy.signal` includes signal processing tools.
        - `scipy.sparse` includes sparse matrices and sparse linear system solvers. 
        - `scipy.stats` includes standard continous and discrete probability distributions (density functions, samplers, continous distribution functions), various statistical tests, and more descriptive statistics. 
- **scikit-learn** is the premier general-purpose machine learning toolkit in Python. 
    - It includes submodules for models such as: 
        - Classification: SVM, nearest neighbors, random forest, logistic regression, etc.
        - Regression: Lasso, ridge regression, etc.
        - Clustering: k-means, spectral clustering, etc.
        - Dimensionality reduction: PCA, feature selection, matrix factorization, etc.
        - Model selection: Grid search, cross-validation, metrics
        - Preprocessing: Feature extraction, normalization
- **statsmodels** is a statistical analysis package that contains algorithms for classical statistics and econometrics. 
    - It includes submodules such as: 
        - Regression models: Linear regression, generalized linear models, robust linear models, linear mixed effects models, etc.
        - Analysis of variance (ANOVA)
        - Time series analysis: AR, ARMA, ARIMA, VAR, and other models
        - Nonparametric methods: Kernel density estimation, kernel regression
        - Visualization of statistical model results
    - statsmodels is more focused on statistical inference, providing uncertainty estimates and p-values for parameters. scikit-learn, by contrast is more prediction-focused. 
    
Data science tasks generally fall into a number of different broad groups:
- *Interacting with the outside world*
    - Reading and writing with a variety of file formats and data stores
- *Preparation*
    - Cleaning, munging, combining, normalizing, reshaping, slicing and dicing, and transforming data for analysis. 
- *Transformation*
    - Applying mathematical and statistical operations to groups of datasets to derive new datasets. 
- *Modeling and computation*
    - Connecting your data to statistical models, machine learning algorithms, or other computational tools. 
- *Presentation*
    - Creating interactive or static graphical visualizations or textual summaries. 
    
**Import Conventions**
```python
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import statsmodels as sm
```
Note: it's considered bad practice in Python software development to import everything from a large package. When you can you should only import what you need. 

**Jargon**
- *Munge/munging/wrangling*
    - Describes the process of manipulating unstructured and/or messy data into a structured or clearn form. 
- *Pseudocode*
    - A description of an algorithm or process that takes a code-like form while n ot being actual valid code. 
- *Syntactic sugar*
    - Programming syntax that does not add new features, but makes something more convenient or easier to type.

**Magic Commands**
IPython's special commands (which are not built into Python itself) are known as "magic" commands. These are designed to facilitate common tasks and enable you to easily control the behavior of the IPython system. All magic commands are prefixed by the symbol `%`. 

|**Command**|**Description**|
|-----------|---------------|
|`%quickref` |Display the IPython Quick Reference Card|
|`%magic`| Display detailed documentation for all of the available magic commands|
|`%debug`| Enter the interactive debugger at the bottom of the last exception traceback|
|`%hist`| Print command input (and optionally output) history|
|`%pdb` |Automatically enter debugger after any exception|
|`%paste`| Execute preformatted Python code from clipboard|
|`%cpaste`| Open a special prompt for manually pasting Python code to be executed|
|`%reset`| Delete all variables/names defined in interactive namespace|
|`%page`| OBJECT Pretty-print the object and display it through a pager|
|`%run`| script.py Run a Python script inside IPython|
|`%prun statement`| Execute statement with cProfile and report the profiler output|
|`%time statement`| Report the execution time of a single statement|
|`%timeit statement`| Run a statement multiple times to compute an ensemble average execution time; useful for timing code with very short execution time|
|`%who`, `%who_ls`, `%whos`| Display variables defined in interactive namespace, with varying levels of information/verbosity|
|`%xdel variable`| Delete a variable and attempt to clear any references to the object in the IPython internals|
|`%matplotlib inline`| displays plots inline|

**Python Language Basics**
- Everything is an object
    - Every number, string, data structure, function, class, module, and so on exists in the Python interpreter in its own "box" which is refered to as a Python object. Each object has an associated *type* and internal data
- Function and object method calls
    - You can call functions using parenthesis and passing zero or more arguments, optionally assigning the returned value to a variable. 
        - `result = f(x,y,z)`
    - Almost every object in Python has attached functions, known as *methods*, that have access to the object's internal contents. 
        - `obj.some_method(x, y, z)`
        
- Binary operators
|**Operation**|**Description**|
|-------------|---------------|
|`a + b`|Add a and b|
|`a - b` |Subtract b from a|
|`a * b` |Multiply a by b|
|`a / b` |Divide a by b|
|`a // b`| Floor-divide a by b, dropping any fractional remainder|
|`a ** b` |Raise a to the b power|
|`a & b` |True if both a and b are True; for integers, take the bitwise AND|
|`a \| b` |True if either a or b is True; for integers, take the bitwise OR
|`a ^ b` |For booleans, True if a or b is True, but not both; for integers, take the bitwise EXCLUSIVE-OR|
|`a == b` |True if a equals b|
|`a != b`| True if a is not equal to b|
|`a <= b, a < b`|True if a is less than (less than or equal) to b|
|`a > b, a >= b`|True if a is greater than (greater than or equal) to b|
|`a is b` |True if a and b reference the same Python object|
|`a is not b` |True if a and b reference different Python objects|

**Dates and times**
The built-in Python `datetime` module provides `datetime`,`date`,and `time` types. Once you have a datetime object you can use different attributes and methods:
- `.day` - returns the day
- `.date()` - returns the date in format year-month-day
- `.time()` - returns the time in format hour:min:sec
- `.strftime('%m/%d/%Y %H:%M')` - formats a datetime as a string. 

Strings can be converted (parsed) into datetime objects with the `strptime` function: e.g., (`datetime.strptime('20091031', '%Y%m%d')`)

- Datetime format specification 
|**Types**|**Description**|
|--------|----------------|
|`%Y` |Four-digit year|
|`%y` |Two-digit year|
|`%m` |Two-digit month [01, 12]|
|`%d`|Two-digit day [01, 31]|
|`%H` |Hour (24-hour clock) [00, 23]|
|`%I` |Hour (12-hour clock) [01, 12]|
|`%M` |Two-digit minute [00, 59]|
|`%S` |Second [00, 61] (seconds 60, 61 account for leap seconds)|
|`%w` |Weekday as integer [0 (Sunday), 6]|
|`%U` |Week number of the year [00, 53]; Sunday is considered the first day of the week, and days before the first Sunday of the year are “week 0”|
|`%W` |Week number of the year [00, 53]; Monday is considered the first day of the week, and days before the first Monday of the year are “week 0”|
|`%z` |UTC time zone offset as +HHMM or -HHMM; empty if time zone naive|
|`%F` |Shortcut for %Y-%m-%d (e.g., 2012-4-18)|
|`%D` |Shortcut for %m/%d/%y (e.g., 04/18/12)|

The difference of two `datetime` objects produces a `datetime.timedelta` type. Adding a `timedelta` to a `datetime` produces a new shifted `datetime`. 

In [16]:
from datetime import datetime
dt = datetime(2011, 10, 29, 20, 30, 21)
print(dt.day)
print(dt.date())
print(dt.time())

29
2011-10-29
20:30:21


**Data Structures and Sequences**
- A **Tuple** is a fixed-length, immutable sequence of Python objects.
    - unpacking tuples: if you try to assign a tuple-like expression of variables, Python will attempt to *unpack* the value on the righthand side of the equal sign.
    ```python
tup = (4,5,6)
a,b,c = tup
```
        - Here `a=4`, `b=5`, and `c=6`
    - A common use of variable unpacking is iterating over sequences of tuples or lists

In [24]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

for a,b,c in seq:
    print('a={0},b={1},c={2}'.format(a,b,c))

a=1,b=2,c=3
a=4,b=5,c=6
a=7,b=8,c=9


In [7]:
val = "español"
val_utf8 = val.encode('utf-8')
val_utf8
type(val_utf8)

bytes

In [8]:
type(val)

str

In [17]:
x = 5
'Non-negative' if x >= 0 else 'Negative'

'Non-negative'

In [18]:
l= [4,0,3]

In [20]:
tuple(l)

(4, 0, 3)