# Welcome back!

# Data, data, data. Programming is *all* about manipulating data

### Note that there are different *data types* in Python such as *int (integer), str (string), list (self-explanatory), float (a "floating point number" is a decimal number), tuple (a special data structure similar to a list, but with important differences), dict (dictionary)*, and more.

### "Primitive" data types are the most basic and include int, float, str, bool (a logical data type that is just "True" or "False") and NoneType (a special data type in Python which is literally just has the value "None").

### Other data types are those which are composed of primitive data types. We call them (wait for it) "composite" data types. These include lists, strings, tuples, dictionaries etc. In this lesson we're going to go over two of the composite data types: lists and strings.


### Anything wrapped in single or double quotes in Python is automatically translated to be a *str* data type.

Example:

In [2]:
x = '4'

In [3]:
type(x)

str

### More detail on strings later

### A *list* is a collection of elements, which can be of *any* data type (including lists themselves) and are wrapped in brackets and separated by commas.

In [5]:
my_list = [1,'hello',['a','b','c']]

### Python allows you to access elements of lists through something called *indexing*, which is done by doing \<list_name\>[i]. *Important*: Python indexing starts at 0, not at 1!

In [6]:
my_list[0]

1

In [7]:
my_list[1]

'hello'

In [8]:
my_list[2]

['a', 'b', 'c']

### We'll get an error if we try to access indices of a list that do not exist

In [9]:
my_list[3]

IndexError: list index out of range

### Lists can be added to other lists, to produce a combined list

In [21]:
other_list = [5,6,7.4]

In [22]:
combined_list = my_list + other_list

In [23]:
combined_list

[1, 'hello', ['a', 'b', 'c'], 5, 6, 7.4]

In [24]:
combined_list[3]

5

### Sometimes we have lists of lists (called nested lists), and we want to access an element of an inner list. Here's how we do it

In [25]:
combined_list[2][1]

'b'

### Why does this work? Very important to understand something here. If the element in the third position (i.e. at index 2) in 'combined_list' is itself a list, then 'combined_list[2]' will *return* a list. Very important to understand this: if 'combined_list[2]' returns a list, then it *is* a list. In other words, the thing you ask Python for is *equal* to the thing it returns to you. Does this make sense? If not, read this again or try using online resources to make sure you get this. Here's how understandin this is helpful: since 'combined_list[2]' is returning a list, it is itself a list, and lists can be indexed into, therefore it make sense to do 'combined_list[2][1]'. If it helps you can think of it as '(combined_list[2])[1]' or '(the list returned by 'combined_list[2]')[1]'

### There is no limit on the depth of nesting

In [27]:
nested_lists = [1,2,[3,4,[5,6,['deepest list']]]]

In [33]:
nested_lists[2]

[3, 4, [5, 6, ['deepest list']]]

In [32]:
nested_lists[2][2]

[5, 6, ['deepest list']]

In [34]:
nested_lists[2][2][2]

['deepest list']

In [36]:
nested_lists[2][2][2][0]

'deepest list'

### What do you think happens if we try to index further after reaching the deepest string element in nested_lists?

In [37]:
nested_lists[2][2][2][0][0]

'd'

### Wait, what?! Apparently, Python lets you index strings the same way you index lists. Let's look more at strings:

In [38]:
string1 = 'Hello World!'

In [39]:
string1[1]

'e'

### Strings can also be added the way that lists are added

In [43]:
part1 = 'Hello'
part2 = ' World!'

In [44]:
part1+part2

'Hello World!'

### Not only can we access single elements of lists, but we can also access chunks or *slices* of lists, using a method in Python called "slicing". Here's how:

In [1]:
numbers = [0,1,2,3,4,5,6,7,8,9]

In [2]:
letters = ['A','B','C','D','E']

In [49]:
numbers[2:6] # the syntax is <list name>[starting index:ending index]

[2, 3, 4, 5]

In [50]:
numbers[1:9]

[1, 2, 3, 4, 5, 6, 7, 8]

### We can omit the starting index and Python just interprets that as "start from index 0"

In [51]:
numbers[:6]

[0, 1, 2, 3, 4, 5]

### Or we can omit the ending index and Python interprets that as "give me everything from the start index until the end"

In [52]:
numbers[2:]

[2, 3, 4, 5, 6, 7, 8, 9]

### Kind of pointless, but if we omit both the starting and ending indices we just get the whole list (but we see Python is logically consistent!)

In [53]:
numbers[:]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

### We can also index into a list from the end, rather than from the beginning 

In [54]:
numbers[-1]

9

In [55]:
numbers[-2:]

[8, 9]

### We can also reverse the order of a list in the following way:

In [57]:
numbers[::-1]

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

### It turns out that there are a ton of built-in Python functions for dealing with lists. We can always see what the optional functions of a built-in datatype/function using the "dir()" function

In [59]:
dir(numbers)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

### Let's see what the "sort" method does. (Wait, what's a method? Well, it's a special type of function. What's a function? Well, that's a couple lessons away. For now, just think of a function as a reusable bit of code that does some specific task). We "call" methods by using "dot notation". It goes like this: 

### \<data_instance\>.\<method_name\>(\<inputs\>) 

### (Note: when we want to show syntax without using a specific name of a variable/function/instance-of-data we use the \< and \> symbols. In reality, you substitute the \<"stuff"\> with actual names of variables/functions/instances-of-data.)

In [14]:
help(letters.sort)

Help on built-in function sort:

sort(*, key=None, reverse=False) method of builtins.list instance
    Stable sort *IN PLACE*.



### Calling the help function on the sort method for lists tells us that it sorts the elements of the list *in place*, which means it doesn't produce a separate, sorted version of our list as new data, but instead sorts the list directly, so you do not have to save the sorted list to a variable--it's already saved...to the variable you already made for the list

In [8]:
letters

['A', 'B', 'C', 'D', 'E']

### This list is already sorted, so to see an example of the sort method in action let's create an unsorted list.

In [1]:
messy_list=['f','a','z','c']

In [2]:
messy_list.sort()

In [3]:
messy_list

['a', 'c', 'f', 'z']

### So we see it works as we expect. We see the sort method also has a "reverse" option which appears to be set to "False" by default. Let's see what happens if we call sort again on our messy_list, but with the reverse option set to "True"

In [5]:
messy_list.sort(reverse=True)

In [6]:
messy_list

['z', 'f', 'c', 'a']

### Aha! It all makes sense!

### There are all kinds of methods for lists. Let's see what the "pop" method does, it sounds interesting.

In [7]:
messy_list.pop()

'a'

In [8]:
messy_list

['z', 'f', 'c']

### So it looks like it removed the last element in our list and did so in place. Hmm. Maybe that could be useful some day.

### Let's see what kind of built-in methods there are for strings! 

In [9]:
my_string = 'This is a string.'

In [10]:
dir(my_string)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


### Sometimes you can find useful information about how a function or method works by checking its *doc string*. We can do this with the \__doc__ dunder ("double underscore") method (more on those later).

In [41]:
''.translate.__doc__

'Replace each character in the string using the given translation table.\n\n  table\n    Translation table, which must be a mapping of Unicode ordinals to\n    Unicode ordinals, strings, or None.\n\nThe table must implement lookup/indexing via __getitem__, for instance a\ndictionary or list.  If this operation raises LookupError, the character is\nleft untouched.  Characters mapped to None are deleted.'

In [37]:
type('')

str

In [24]:
my_string.rsplit(' ')

['This', 'is', 'a', 'string.']

In [33]:
' and '.join(letters)

'E and D and C and B and A'

In [40]:
turing = "Born in Maida Vale, London, Turing was raised in southern England. He graduated from King's College, Cambridge, with a degree in mathematics. Whilst he was a fellow at Cambridge, he published a proof demonstrating that some purely mathematical yes–no questions can never be answered by computation. He defined a Turing machine and proved that the halting problem for Turing machines is undecidable. In 1938, he earned his PhD from the Department of Mathematics at Princeton University."

In [35]:
turing

"Born in Maida Vale, London, Turing was raised in southern England. He graduated from King's College, Cambridge, with a degree in mathematics. Whilst he was a fellow at Cambridge, he published a proof demonstrating that some purely mathematical yes–no questions can never be answered by computation. He defined a Turing machine and proved that the halting problem for Turing machines is undecidable. In 1938, he earned his PhD from the Department of Mathematics at Princeton University."

In [36]:
turing = turing.rsplit(' ')

In [41]:
len(turing)

485

In [42]:
False

False

In [44]:
type(True)

bool

In [45]:
type(my_string)

str

In [50]:
type(numbers[0])

int

In [1]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

### In programming, any time we want to do a task that we know we will do repeatedly, we encode the instructions for that task into something called a function. Functions can be *defined* ahead of time, and then *called* whenver you are ready to use it. When we create a data type in programming we describe to the computer everything about that data type (how to refer to it, how to combine it with similar data etc.) in something called a *class*. Whenever you define a class, you can create very specific functions that work on that class, and these special functions are called *methods*

### To ask for help on a specific method for a datatype (i.e. a class), you need to do help(\<class\>.method)
    

In [4]:
help(str.isalpha)

Help on method_descriptor:

isalpha(self, /)
    Return True if the string is an alphabetic string, False otherwise.
    
    A string is alphabetic if all characters in the string are alphabetic and there
    is at least one character in the string.



### Let's test it out!

In [10]:
test_str1 = 'AFDFEGWREWBRFVREW'
test_str2 = 'AFDFEGW443EWBRFVREW'

print("test_str1 is alphabetic:", test_str1.isalpha())
print("test_str2 is alphabetic:", test_str2.isalpha())

test_str1 is alphabetic: True
test_str2 is alphabetic: False


### Recall that strings are treated very similar to lists in Python (although they really are distinct data types), so it shouldn't be a surprise that we can slice strings just like we do with lists

In [11]:
test_str  = 'Here is a string for us to test slicing with.'

In [12]:
test_str[3:5]

'e '

In [13]:
test_str[:5]

'Here '

### We can even reverse strings the same way we reverse lists

In [14]:
test_str[::-1]

'.htiw gnicils tset ot su rof gnirts a si ereH'

### Here's a cool method to find a specific word and return the index of the beginning of the *first* instance of that word int the string:

In [15]:
test_str.index('slicing')

32

### So now we know that the 's' in "slicing" is at index 32 in our string and we can do this:

In [16]:
test_str[32:]

'slicing with.'

In [2]:
print("And then he said, "Hello".")

SyntaxError: invalid syntax (1641284142.py, line 1)

In [5]:
print('And then he said, "Hello".')

And then he said, "Hello".


In [6]:
print('And then he said, "Hello, that\'s a nice hat".')

And then he said, "Hello, that's a nice hat".
