# Introduction to Data Structure

This lesson will be a double-shot of essential Python types: **strings** and **dictionaries**.

# Strings
* A string is a sequence of characters 
* A string literal uses quotes 'Hello' or “Hello” 
* For strings, + means “concatenate” 
* When a string contains numbers, it is still a string 
* We can convert numbers in a string into a number using int()

In [None]:
str1 = "Hello"
str2 ='there'

In [None]:
twoString = str1 + str2
print(twoString)

Hellothere


In [None]:
str3 = '123'
str3 = str3 + 1

TypeError: ignored

In [None]:
x = int(str3) + 1
print(x) 

124


In [None]:
x = 'Pluto is a planet'
y = "Pluto is a planet"
x == y

True

In [None]:
'Pluto's a planet!'

SyntaxError: ignored

We can fix this by "escaping" the single quote with a backslash. 

In [None]:
'Pluto\'s a planet!'

"Pluto's a planet!"

The table below summarizes some important uses of the backslash character.

| What you type... | What you get | example               | `print(example)`             |
|--------------|----------------|--------------------------------------------------------|
| `\'`         | `'`            | `'What\'s up?'`         | `What's up?`                 |  
| `\"`         | `"`            | `"That's \"cool\""`     | `That's "cool"`              |  
| `\\`         | `\`            |  `"Look, a mountain: /\\"` |  `Look, a mountain: /\`  |
| `\n`        |   <br/>      |   `"1\n2 3"`                       |   `1`<br/>`2 3`              |

The last sequence, `\n`, represents the *newline character*. It causes Python to start a new line.

In [None]:
hello = "hello\nworld"
print(hello)

hello
world


In addition, Python's triple quote syntax for strings lets us include newlines literally (i.e. by just hitting 'Enter' on our keyboard, rather than using the special '\n' sequence). We've already seen this in the docstrings we use to document our functions, but we can use them anywhere we want to define a string.

In [None]:
triplequoted_hello = """hello
world"""
print(triplequoted_hello)
triplequoted_hello == hello

hello
world


True

The `print()` function automatically adds a newline character unless we specify a value for the keyword argument `end` other than the default value of `'\n'`:

In [None]:
print("hello")
print("world")
print("hello", end='')
print("pluto", end='')

hello
world
hellopluto

**Looking Inside String**

Strings can be thought of as sequences of characters. Almost everything we've seen that we can do to a list, we can also do to a string.

In [None]:
# Indexing
fruit = 'banana'
letter = fruit[1]
print(letter)

a


In [None]:
# Slicing
fruit[-3:]

'ana'

In [None]:
# How long is this string?
len(fruit)

6

**A Character Too Far**
* You will get a python error if you attempt to index beyond the end of a string. 

In [None]:
print(fruit[6])

IndexError: ignored

### Looping Through Strings
Using a **while** statement and an **iteration variable**, and the **len** function, we can construct a loop to look at each of the letters in a string individually

In [None]:
index = 0
while index < len(fruit):
  letter = fruit[index]
  print(index, letter)
  index = index + 1

0 b
1 a
2 n
3 a
4 n
5 a


* A definite loop using a **for** statement is much more elegant.
* The **iteration variable** is completely taken care of by the **for** loop.

In [None]:
for letter in fruit:
  print(letter)

b
a
n
a
n
a


In [None]:
# Yes, we can even loop over them
[char for char in fruit]

['b', 'a', 'n', 'a', 'n', 'a']

But a major way in which they differ from lists is that they are *immutable*. We can't modify them.

In [None]:
fruit[0] = 'K'
# planet.append doesn't work either

TypeError: ignored

## String methods

* Like `list`, the type `str` has lots of very useful methods. I'll show just a few examples here.
* These **functions** do not modify the original string, instead they return a new string that has been altered

In [None]:
dir(fruit)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

In [None]:
# ALL CAPS
claim = "Pluto is a planet!"
claim.upper()

'PLUTO IS A PLANET!'

In [None]:
# all lowercase
claim.lower()

'pluto is a planet!'

In [None]:
# Searching for the first index of a substring
claim.index('plan')

11

In [None]:
claim.startswith('P')

True

In [None]:
claim.endswith('!')

True

### Building strings with `.format()`

Python lets us concatenate strings with the `+` operator.

In [None]:
planet = 'Pluto'

In [None]:
planet + ', we miss you.'

'Pluto, we miss you.'

If we want to throw in any non-string objects, we have to be careful to call `str()` on them first

In [None]:
position = 9
planet + ", you'll always be the " + position + "th planet to me."

TypeError: ignored

In [None]:
planet + ", you'll always be the " + str(position) + "th planet to me."

"Pluto, you'll always be the 9th planet to me."

This is getting hard to read and annoying to type. `str.format()` to the rescue.

In [None]:
"{}, you'll always be the {}th planet to me.".format(planet, position)

"Pluto, you'll always be the 9th planet to me."

So much cleaner! We call `.format()` on a "format string", where the Python values we want to insert are represented with `{}` placeholders.

Notice how we didn't even have to call `str()` to convert `position` from an int. `format()` takes care of that for us.

If that was all that `format()` did, it would still be incredibly useful. But as it turns out, it can do a *lot* more. Here's just a taste:

In [None]:
pluto_mass = 1.303 * 10**22
earth_mass = 5.9722 * 10**24
population = 52910390
#         2 decimal points   3 decimal points, format as percent     separate with commas
"{} weighs about {:.2} kilograms ({:.3%} of Earth's mass). It is home to {:,} Plutonians.".format(
    planet, pluto_mass, pluto_mass / earth_mass, population,
)

"Pluto weighs about 1.3e+22 kilograms (0.218% of Earth's mass). It is home to 52,910,390 Plutonians."

In [None]:
# Referring to format() arguments by index, starting from 0
s = """Pluto's a {0}.
No, it's a {1}.
{0}!
{1}!""".format('planet', 'dwarf planet')
print(s)

Pluto's a planet.
No, it's a dwarf planet.
planet!
dwarf planet!


You could probably write a short book just on `str.format`, so I'll stop here, and point you to [pyformat.info](https://pyformat.info/) and [the official docs](https://docs.python.org/3/library/string.html#formatstrings) for further reading.

# Dictionaries

* Dictionaries are Python’s most powerful data collection 
* Dictionaries allow us to do fast database-like operations in Python
* Dictionaries have different names in different languages 
  - Associative Arrays - Perl / Php 
  - Properties or Map or HashMap - Java 
  - Property Bag - C# / .Net

In [None]:
purse = dict()
# purse = {}

In [None]:
purse['money'] = 200
purse['candy'] = 4
purse['tissues'] = 75
print(purse)

{'money': 200, 'candy': 4, 'tissues': 75}


**Dictionaries** are like bags - no order

In [None]:
purse['money']

200

So we index the things we put in the dictionary with a "lookup tag"/"key"

In [None]:
numbers = {'one':1, 'two':2, 'three':3}

In this case `'one'`, `'two'`, and `'three'` are the **keys**, and 1, 2 and 3 are their corresponding values.

Values are accessed via square bracket syntax similar to indexing into lists and strings.

In [None]:
numbers['one']

1

We can use the same syntax to add another key, value pair

In [None]:
numbers['eleven'] = 11
numbers

{'eleven': 11, 'one': 1, 'three': 3, 'two': 2}

Or to change the value associated with an existing key

In [None]:
numbers['one'] = 'Pluto'
numbers

{'eleven': 11, 'one': 'Pluto', 'three': 3, 'two': 2}

Python has *dictionary comprehensions* with a syntax similar to the list comprehensions we saw in the previous tutorial.

In [None]:
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune']
planet_to_initial = {planet: planet[0] for planet in planets}
planet_to_initial

{'Earth': 'E',
 'Jupiter': 'J',
 'Mars': 'M',
 'Mercury': 'M',
 'Neptune': 'N',
 'Saturn': 'S',
 'Uranus': 'U',
 'Venus': 'V'}

Usually we use dictionary to count how often we "see" something --> count the frequent

In [None]:
counts = dict()
names = ['Ali', 'Tom', 'Smanga', 'Tom', 'Ali', 'Smanga', 'Ali']
for name in names:
  if name not in counts:
    counts[name] = 1
  else:
    counts[name] = counts[name] + 1
print(counts)

{'Ali': 3, 'Tom': 2, 'Smanga': 2}


Another solution using **get** method

We can use **get()** and provide a **default value of zero** when the **key** is not yet in the dictionary - and then just add one

In [None]:
counts = dict()
names = ['Ali', 'Tom', 'Smanga', 'Tom', 'Ali', 'Smanga', 'Ali']
for name in names:
  counts[name] = counts.get(name,0) + 1
print(counts)

{'Ali': 3, 'Tom': 2, 'Smanga': 2}


A for loop over a dictionary will loop over its keys

In [None]:
for k in numbers:
    print("{} = {}".format(k, numbers[k]))

one = Pluto
two = 2
three = 3
eleven = 11


We can access a collection of all the keys or all the values with `dict.keys()` and `dict.values()`, respectively.

The very useful `dict.items()` method lets us iterate over the keys and values of a dictionary simultaneously. (In Python jargon, an **item** refers to a key, value pair)

In [None]:
for key, value in counts.items():
  print(key, value)

Ali 3
Tom 2
Smanga 2


To read a full inventory of dictionaries' methods, click the "output" button below to read the full help page, or check out the [official online documentation](https://docs.python.org/3/library/stdtypes.html#dict).

In [None]:
help(dict)

Help on class dict in module builtins:

class dict(object)
 |  dict() -> new empty dictionary
 |  dict(mapping) -> new dictionary initialized from a mapping object's
 |      (key, value) pairs
 |  dict(iterable) -> new dictionary initialized as if via:
 |      d = {}
 |      for k, v in iterable:
 |          d[k] = v
 |  dict(**kwargs) -> new dictionary initialized with the name=value pairs
 |      in the keyword argument list.  For example:  dict(one=1, two=2)
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, key, /)
 |      True if D has a key k, else False.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |