# Introduction to Data Science in Python

### Essential Python Libraries

* NumPy (Numerical Python)
* pandas
* matplotlib
* IPython and Jupyter
* SciPy
* scikit-learn
* statsmodels
* Miniconda, a minimal installation of the conda package manager, along with conda-forge, a community-maintained software distribution based on conda.

[Reference for installation](https://wesmckinney.com/book/preliminaries#installation_mac)

### Running the Jupyter Notebook

To open the Jupyter Notebook, type *jupyter notebook*.

### Running the IPython Shell

To start running the IPython Shell, just type *ipython*.

### Activate virtual environment

```
conda activate <venv>
```

## Fundamentals of Data Manipulation with Python

Note that list concatenation by addition is a comparatively expensive operation since a new list must be created and the objects copied over. Using extend to append elements to an existing list, especially if you are building up a large list, is usually preferable. Thus:

In [2]:
list_of_lists = [[i, i + 1, i + 2] for i in range(5)]
everything = []
for chunk in list_of_lists:
    everything.extend(chunk)

everything

[0, 1, 2, 1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 6]

You can select sections of most sequence types by using slice notation, which in its basic form consists of *start:stop* passed to the indexing operator []. Slices can also be assigned with a sequence:

In [3]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[3:5] = [6, 3]

seq

[7, 2, 3, 6, 3, 6, 0, 1]

A `step` can also be used after a second colon to, say, take every other element:

In [4]:
seq[::2]

[7, 3, 3, 0]

A clever use of this is to pass -1, which has the useful effect of reversing a list or tuple:

In [5]:
seq[::-1]

[1, 0, 6, 3, 6, 3, 2, 7]

You can merge one dictionary into another using the update method:

In [6]:
d1 = {"a": 1, "b": 2}
d1.update({"b": "foo", "c": 12})
d1

{'a': 1, 'b': 'foo', 'c': 12}

It’s common to occasionally end up with **two sequences that you want to pair up element-wise in a dictionary**. As a first cut, you might write code like this:

```python
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value
```

In [7]:
tuples = zip(range(5), reversed(range(5)))
tuples

<zip at 0x1040eb680>

In [8]:
mapping = dict(tuples)
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

The `setdefault` dictionary method can be used to simplify stuff:

In [9]:
words = ["apple", "bat", "bar", "atom", "book"]
by_letter = {}

for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)

by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The built-in collections module has a useful class, `defaultdict`, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dictionary:

In [10]:
from collections import defaultdict
by_letter = defaultdict(list)

for word in words:
    by_letter[word[0]].append(word)

by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

While the values of a dictionary can be any Python object, the keys generally have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too). The technical term here is hashability. You can check whether an object is hashable (can be used as a key in a dictionary) with the hash function:

In [12]:
hash("string")

7322611513465327997

In [13]:
hash([1,2])

TypeError: unhashable type: 'list'