# The Building Blocks: Data Structures in Python

## Why are we using Python, again?
1. Portable: crossplatform and free/open source ([Check it out on Github!](https://github.com/python/))
2. Consistent: while this principle is difficult to grasp at first pass, everything in Python is designed and built on several core principles. (See, e.g., [this talk](https://www.youtube.com/watch?v=cKPlPJyQrt4))
3. Productive: quickly build out an application with less code
4. Prolific: vast library for any application
5. Readable: enforces readability (See [PEP-8](https://www.python.org/dev/peps/pep-0008/))

## Where to start?
Starting any new paradigm is a bit of a pain, as one has to become accustomed to a new syntax, a new style, and new data structures for getting things done. In this tutorial, we'll look at getting some of the dirty work out of the way so that we can turn our focus to the practical application of these things. In fact, every one of these sessions will provide examples using the principles learned to build out some simple application.

## How to run a Python application
1. Install Python (see instructions elsewhere)
2. Create a file called `file.py`
3. Open the file in your favorite editor, and include the text `print('Hello World')`
4. Save the file
5. Open powershell/cmd in the directory where your file is located
6. Run the command `python file.py`

## Okay, now let's dive in
* Types
* Variables
* Fundamental data structures
* Application: ??

In [2]:
# numeric types
print(42)

42


In [3]:
type(42)

int

In [7]:
id(42)

1928064752

In [5]:
def print_data(x):
    print(x)
    print(type(x))
    print(id(x))

In [6]:
print_data('Hello world')

Hello world
<class 'str'>
2062689607472


In [8]:
# We can assign these values to a variable
s = 'Hello world'
print_data(s)

Hello world
<class 'str'>
2062689068144


In [9]:
i = 42
print_data(i)

42
<class 'int'>
1928064752


In [10]:
i = 42.0
print_data(i)

42.0
<class 'float'>
2062687520256


## What can we do with numbers?

In [11]:
a = 12
b = 4
c = -7
d = 0

In [12]:
a + b + c  # addition 

8

In [13]:
a - b - c  # subtraction

16

In [15]:
print(a / b)  # true division
print(a / c)  
print(a // b)  # integer division
print(a // c)  

4.0
-1.7142857142857142
4
-2


In [19]:
print(a % b)  # modulus/remainder
print(a % c)

0
-2


In [16]:
a * b  # multiplication

36

In [18]:
print(a ** b)  # power
print(a ** c)

1728
2.790816472336534e-08


In [20]:
# errors for illegal operations
a / d

ZeroDivisionError: division by zero

## Booleans: tell me the truth

In [26]:
print_data(True)
print_data(False)

True
<class 'bool'>
1927807152
1
<class 'int'>
1928063440


In [25]:
print(True and False)
print(True or False)
print(not (not False and not True))

False
True
True


* Variables point to some place in memory where data is stored
* Variables can also point to collections of variables. 
* The primary collections/data structures are:
	* list
	* tuple
	* dict

### Strings: Understanding Collection Basics

## The Immutables

### Strings: Understanding Collection Basics

In [27]:
s = 'Hello world'
s

'Hello world'

#### Indexing Basics

In [31]:
# reference parts of string by its index
s[0]  # first element

'H'

In [33]:
s[len(s) - 1], s[-1]  # access from end of list

('d', 'd')

In [34]:
# accessing a range
print(s[0:2])  # start at 0, end at 1 (last number is exclusive
print(s[2:])  # omit stopping point to include rest of string

He
llo world


In [35]:
# only including the stopping
print(s[:2])
print(s[:-1])

He
Hello worl


In [37]:
# add a step as well, though I haven't found much use for this
print(s[::1])  # the default is 1
print(s[2:8:2])
print(s[::-1])  # every letter backwards

Hello world
low
dlrow olleH


In [38]:
# so, what is "immutable"?
s[0] = 'a'

TypeError: 'str' object does not support item assignment

### Tuples: Unchanging collections/records
When you have data that should be stored together, like a record.

In [39]:
t = ('rainy', 25.9, 13.3, 65)
print_data(t)

('rainy', 25.9, 13.3, 65)
<class 'tuple'>
2062688874648


In [40]:
bad_tuple = (1)
print_data(bad_tuple)  # not a tuple!

1
<class 'int'>
1928063440


In [41]:
good_tuple = (1, )
print_data(good_tuple)

(1,)
<class 'tuple'>
2062688859696


In [44]:
# implied tuples are used heavily
a, b = 1, 2
b, a = a, b
a, b

(2, 1)

## Lists
* Great for ordered data

In [45]:
lst = []
len(lst)

0

In [46]:
lst.append(2)
len(lst)

1

In [47]:
lst.insert(0, 5)  # insert a `5` at index 0
lst

[5, 2]

In [48]:
lst.extend([5, 2, 7, 13, 9, 'hi'])
lst

In [49]:
lst.count(5)

2

In [52]:
lst.index('hi')

7

In [73]:
'hi' in lst

False

In [53]:
# remove last element with 'pop'
lst.pop()
lst

[5, 2, 5, 2, 7, 13, 9]

In [57]:
lst.sort()
lst

[2, 2, 5, 5, 7, 9, 13]

In [74]:
lst.index('hi')

ValueError: 'hi' is not in list

In [77]:
if 'hi' in lst:
    print(lst.index('hi'))
else:
    print('Good bye')

Good bye


### Combining lists with tuples

In [69]:
lst = [('Los Angeles', 80.0, 2), ('Khartoum', 97.2, 3), ('Tsetserleg', -20.5, 5), ('Tsetserleg', -20.5, 5)]
lst

[('Los Angeles', 80.0, 2),
 ('Khartoum', 97.2, 3),
 ('Tsetserleg', -20.5, 5),
 ('Tsetserleg', -20.5, 5)]

In [61]:
# access the index and record
lst[0][2], lst[-1][0]

(2, 'Tsetserleg')

In [72]:
# iterate through items
for element in lst:
    if element[2] > 4:
        print(element[0])
    elif 'a' in element[0]:
        print('ahhh')

ahhh
Tsetserleg
Tsetserleg


In [63]:
5 > 2, 4 <= 4

(True, True)

## Sets (Unordered Unique Lists)
* Much quicker membership checks `O(1)`

In [70]:
set(lst)

{('Khartoum', 97.2, 3), ('Los Angeles', 80.0, 2), ('Tsetserleg', -20.5, 5)}

In [71]:
print_data({1, 2, 3})

{1, 2, 3}
<class 'set'>
2062688486344


## Dictionaries (Maps/Hashmaps)
* Very quick access to data associated with value `O(1)`

In [64]:
d = {}  # this is not a set
print_data(d)

{}
<class 'dict'>
2062688836632


In [65]:
# assign a new value
d['Ulaanbatar'] = (20, 30)
d

{'Ulaanbatar': (20, 30)}

In [67]:
d['Ulaanbatar'][0]

20

In [68]:
# create a dictionary from our list
d = {}  # reset dict
for name, temp, score in lst:
    d[name] = (temp, score)
d

{'Khartoum': (97.2, 3), 'Los Angeles': (80.0, 2), 'Tsetserleg': (-20.5, 5)}

## Application: What can we do with this?

There are many things, but because I want to focus on building useful utilities from the command line, let's create a utility that looks for a file in a directory. Windows can take forever to do this.

1. Recursively read files in all directories
2. If filename/directory contains word, print it out
3. Provide summary of all files/dirs found with complete path and a score (how many words does it have present)
4. In directories:  <-- this seems lame
	1. count the number of different types of extensions
5. For multiple words, show the directories/files associated with each
	

1. Recursively read files in all directories
2. If filename/directory contains word, print it out
3. Provide summary of all files/dirs found with complete path and a score (how many words does it have present)
4. In directories:  <-- this seems lame
	1. count the number of different types of extensions
5. For multiple words, show the directories/files associated with each
	

### Extra Credit:
* Modify this to look in file contents as well
* To open a file:

```python
with open(filepath) as fh:
    fh.read()
```