#  Python for Economic and Social Data Science: Lecture One

### 15th July, 2024
---

## Section 1: Introduction

Welcome! This is the first of five lectures in this short informal module. For introductory information outlining the class, please see the readme.md file in the [Github repository](https://github.com/crahal/Python_for_DataScience). Please note: this set of five lectures will revolve around Python 3.x, and while you can follow the examples and homeworks in a text-editor ane execute the scripts via the command line, [Jupyter Notebooks](https://jupyter.org/) are ***strongly*** advised! All important Python concepts are in boldface where possible.

#### Learning Objectives:

In this very introductory lecture we will:

* Introduce Python and Git
* Cover primative data types (characters and numbers)
* Introduce various types of object.

Without getting ahead of ourselves, we can note at this early stage that in Python, everything is an object (except control flows), and that it is an object-oriented language (although not a 'pure' one).


### Section 1.1: The Command Line

Using the command line for data science offers powerful capabilities for managing, processing, and analyzing data efficiently. It provides direct access to tools like Python, R, and specialized libraries (e.g., pandas, numpy) for data manipulation, statistical analysis, and machine learning.

Key tasks include navigating file systems to access datasets, performing complex data transformations using command-line tools (e.g., sed, awk), and automating repetitive tasks with shell scripting. The command line allows for seamless integration of data processing pipelines, enhancing reproducibility and scalability in projects.

Furthermore, version control with Git is seamlessly integrated into command-line workflows, enabling tracking of changes in code, data, and analysis scripts. This ensures collaboration and facilitates the documentation of project evolution over time.

However, your use of the command line will vary depending on not just what you are doing, but also on which operating system you are using.

**Windows Command Prompt (cmd.exe):**

* *Shell:* Uses Command Prompt (cmd.exe) with its own set of commands.
* *File Paths:* Uses backslashes () in file paths (e.g., C:\Users\Username\Documents).
* *Capabilities:* Limited compared to Unix-like systems in built-in commands and utilities.

**macOS and Linux:**

* *Shell:* Typically uses Bash or Zsh.
* *File Paths:* Uses forward slashes (/) in file paths (e.g., /Users/Username/Documents).
* *Capabilities:* Rich set of built-in commands and utilities, extensive support for scripting and automation.


### Section 1.2: Git

Git for Data Science manages version control for data-related projects. It tracks changes in datasets, code, and configuration files, facilitating collaboration and reproducibility. By using Git, data scientists can maintain a history of changes, revert to previous versions, and manage branches for different analysis paths or experiments.

Key benefits include transparency in project evolution, enabling teams to understand who made what changes and when. It ensures data integrity by documenting modifications and providing a structured workflow for data manipulation and analysis. Git also supports integration with platforms like GitHub or GitLab for remote collaboration and backup.

Here are some useful Git commands for us to think about:

* **git clone**: Creates a local copy of a remote repository onto your machine. It downloads all files, branches, and commit history from the remote repository to your local environment, allowing you to work on the project locally.

* **git pull**: Updates your local repository with changes from the remote repository. It fetches the latest changes (commits, files) from the remote repository and merges them into your current branch. Useful to synchronize your local repository with the latest changes made by others.

* **git push**: Sends your committed changes from your local repository to the remote repository. After making changes locally and committing them, git push uploads these changes to the corresponding branch on the remote repository, making them accessible to others working on the project.

* **git commit**: Records changes made to the files in your local repository. It captures a snapshot of the current state of files you have staged (added) and stores it as a commit object in the local Git repository. Each commit is accompanied by a commit message that describes the changes made.

It is really important that you try to install and familiarise yourself with Git; it's one of the main parts of a data scientist's toolbox. In particular, you should try cloning, or forking the repository which this teaching material is held in, and then try editing and running the scripts (if this is not possible for any reason, simply download the repository).

### Section 1.3: Python

Python is a high-level, interpreted programming language characterized by its simplicity and readability. It supports multiple programming paradigms:

* **Dynamic Typing:** Python uses dynamic typing, meaning variables are not explicitly declared with types. Instead, types are inferred at runtime based on the assigned value. This flexibility allows for easier and more concise code but can lead to runtime errors if types are mismatched.

* **Procedural Programming:** Python supports procedural programming where tasks are structured into procedures or functions that perform operations on data. It emphasizes step-by-step instructions for solving problems, making it straightforward for beginners and useful for organizing code into reusable components.

* **Object-Oriented Programming (OOP):** Python is also an object-oriented language, where data and functionality are encapsulated into objects. Objects can interact with each other through methods (functions belonging to classes) and inheritance (deriving new classes from existing ones), promoting modularity and code reuse.

* **Functional Programming:** Python incorporates functional programming concepts such as first-class functions (functions treated as first-class citizens) and lambda expressions (anonymous functions). Functional programming emphasizes functions as the primary building blocks, promoting immutability and pure functions for predictable behavior.

* **Automatic Memory Management:** Python features automatic memory management through garbage collection. It automatically allocates memory for new objects and reclaims memory occupied by objects no longer in use, relieving developers from manual memory management tasks present in lower-level languages like C or C++.

Python's strengths lie in its extensive standard library and third-party packages, facilitating diverse applications in web development, scientific computing, and data analysis. Its interpreted nature allows for rapid development and debugging, while its ecosystem supports frameworks (e.g., Django, Flask), libraries (e.g., NumPy, pandas), and tools (e.g., Jupyter Notebooks) essential for modern software development and data science workflows.

#### Section 1.3.1: Python 2.* vs Python 3.*

Python 2 and Python 3 represent two major versions of the Python programming language, each with distinct differences. Python 2, released in 2000, was widely used but reached its end of life in 2020, ceasing official support and updates. Python 3, introduced in 2008, aimed to fix design flaws, improve consistency, and enhance performance. Key disparities include print syntax (parentheses in Python 3), integer division behavior (Python 3 returns float by default), and Unicode support (Python 3 handles strings as Unicode by default). Python 3 offers significant improvements, yet migration challenges exist due to codebase compatibility issues, impacting adoption rates.

### Section 1.4.: Your turn!

Have you used the command line before? What about Git or Python? Can you name some data scientific endeavours which utilise these tools specifically well?

## Section 2: Primative Data Types

Before we get to objects, which are the abstract building blocks of data, let's first introduce some primitive data types:

* Characters: 'A', '!', '1' -- one single 'glyph'
* Numbers
* Two types of numbers:
** Integer: 1,2,3, -500, +600,
** Float: 1.123, 0.1232534, -4123.123123

### Section 2.1: Characters

A character is a single 'glyph' that is included in a 'character set'. Some characters are visible, such as the letter A (or 'a' - note case sensitivity will be *extremely* important, as we'll see later) and some are 'invisible' to some extent. A word is the simple character set, for exmaple. Within this specific Jupyter Notebook, multiple characters make content in 'markdown'. Characters are also joined together to form variable names, or control the flow of a Python program. There are three important invisible characters:

1. Space (the common whitespace which is separating these words)
2. Tab (which is usually about four spaces long - more in Lecture Two when we talk about indenting)
3. Newline character (which tells the computer to move to the next line)

Lets look at some examples, and introduce the hugely important `print` command:

In [1]:
print('1. hello friends! how are you today?')
print('2. hello friends! \nhow are you today?')
print('3. hello friends! \thow are you today?')

1. hello friends! how are you today?
2. hello friends! 
how are you today?
3. hello friends! 	how are you today?


We can at this point introduce another very important command: `type'. This tells us what 'type' of object something is. 

In [2]:
type('this is a group of characters')

str

In [3]:
type(1)

int

In [4]:
type(1.0)

float

In [5]:
type([1, 1.0])

list

Here are the ten most important types of object in Python, which we'll see most -- if not all -- of during this course:

1. int: Represents integer numbers (whole numbers without decimals).
2. float: Represents floating-point numbers (numbers with decimals or scientific notation).
3. str: Represents sequences of characters (textual data enclosed in quotes).
4. list: Represents ordered collections of items (mutable sequences enclosed in square brackets).
5. dict: Represents collections of key-value pairs (mutable mappings enclosed in curly braces).
6. tuple: Represents ordered collections of items (immutable sequences enclosed in parentheses).
7. bool: Represents boolean values (True or False).
8. set: Represents unordered collections of unique items (mutable collections enclosed in curly braces or created using the set() constructor).
9. NoneType: Represents the absence of a value (often used as a default return value).
10. function: Represents callable objects that can perform a task or return a value when called.

### 2.2 Numbers 

Numbers come in several **types** (of object), but two are specifically important to mention:

1. Integers - whole numbers, such as 1 or 42,
2. Floating point numbers - these allow for decimal points such as 2.17 or 3.33

Lets first consider this: 

In [6]:
12 / 5

2.4

What's happened here? We've divided two integers to get a float! Floats can be distinguished from integers because they have a fractional part. We can *force* a number to be either an integer or a float using int() and float():

In [7]:
int(12/5)

2

In [8]:
float(12/5)

2.4

In [9]:
type(int(12/5))

int

In [10]:
type(12/5)

float

Note also that we can chain these commands together, with the outside one taking precedence:

In [11]:
type(float(int(12/5)))

float

Note, also, here, a couple of important things about Python:

1. It is case sensitive. What happens if you do `type(Float(int(12/5)))`?
2. Every time we open a bracket, or parenthesis, we need to close it. What happens if you do `type(Float(int(12/5))`?

### Section 2.3. Strings

A character isn't very useful on its own. As alluded to above, multiple characters together form a string. In Python, strings are enclosed using quotations. You can use a variety of quotations in order to close the string (i.e. single or double). Two things to remember:

1. Always close the string with the same quotes used to open it,
2. Always escape a quotation character if you use it inside the string.

```python
'This is a string.'

"This is also a string."

'"This is yet another string!
s
sdfasdfa
asdf
asdf
asdf"'

''This is not a string, but why?''
```

In Python 3.* all strings are printed inside parentheses like the following:
```python
print("This is a string!")
```

How does this differ from Python 2.x? 

Now let's **assign** a string to a variable, then print the variable

In [12]:
SomeVar = "This is a string. It has been assigned to a variable called 'SomeVar'"
print(SomeVar)

This is a string. It has been assigned to a variable called 'SomeVar'


A variable is a name that is given to an object whose contents can change.

There are some really important variable naming conventions:

* ALLCAPS means a variable that we like to keep constant, like a secret key.
* underscore: '__' means a variable that is hidden and shouldn't be referenced directly.
* variables should only start with ASCII characters (and _not_ numbers).
* use a consistent style, such as camelCaseNames or underscore_variable_names.
* alllowercasenounderscorenames are very hard to read.
* try to give a variable a relevant name; it will make your life much easier!

If we want to print and assign a float or int, we have to force its type to a string.

In [13]:
RandomNumber = 2834
print('When printing numbers, we need to convert into a string if we are joining them to a string: ' + str(RandomNumber))

When printing numbers, we need to convert into a string if we are joining them to a string: 2834


But, if we are simply printing the numbers, we don't need to do that:

In [14]:
print(5.00+RandomNumber)

2839.0


### Section 2.4: 📝 Character Sets 📝

#### Section 2.4.1: ASCII

Strings are drawn from character sets. Loosely, 'the alphabet' is a characeter set, but not a very useful one, because it's so limited. The basic Western character set is ASCII. It has 128 code points. The first 38 are control characters, like 'new line', and the remainder are the upper and lower case alphabet, ten digits and punctuation characters. ASCII is not really sufficient for most languages or most of our data intensive purposes.

In [15]:
# Printing all ASCII characters (from 0 to 127)
''.join([chr(i) for i in range(128)])

'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f'

Some of these should be familiar to you, some might not be. Also note: why didn't we have to print that to see the output of it?

#### Section 2.4.2: UTF-8

Unicode is meant to be a very large character set, containig over a million code points. As such, unicode includes most characters from most languages around the world, as well as the emergent emoji character set. Python 3 makes it pretty straightforward.


In [16]:
print("UTF-8 Examples:")
print("Smiley face: 😊")  # U+1F60A
print("Snowflake: ❄️")   # U+2744
print("Copyright symbol: ©")  # U+00A9
print("Currency symbols: €, ¥, £")  # U+20AC, U+00A5, U+00A3
print("Mathematical symbols: ∑, ∫, √")  # U+2211, U+222B, U+221A
print("Greek letters: Ω, α, β")  # U+03A9, U+03B1, U+03B2
print('\U0001f334') # This is the emoji code point. 
print(b'\U0001f334') # This is what happens when you print it as a 'bytestring'
print('🌴') # You can print emoji directly
print('🌴' in 'Yeah, great job! 🌴')

UTF-8 Examples:
Smiley face: 😊
Snowflake: ❄️
Copyright symbol: ©
Currency symbols: €, ¥, £
Mathematical symbols: ∑, ∫, √
Greek letters: Ω, α, β
🌴
b'\\U0001f334'
🌴
True


What is going on with the last example here?

### Section 2.5: String manipulation

It is critical to note that as opposed to other languages, strings are [indexed starting from 0](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html) and work sequentially forward. So in the string: "python is the best", there is an 'p' as the 0th element and an 'n' as the 5th.

In [17]:
variable = "python is the best"
print(variable[0], variable[1], variable[2],
     variable[3], variable[4], variable[5])
print(variable[10])

p y t h o n
t


The above hopefully shows us that a string is really just a list of characters (as in a series of characters that one would string together).

#### Section 2.5.1 Zero Indexing

Zero indexing in programming starts counting elements from 0 rather than 1. Python uses zero indexing for consistency with C and other languages, aligning well with memory addressing and simplifying operations like array access and iteration. This convention ensures compatibility and efficiency in programming practices.

Can you print out the 'b' from this variable? Can you think of a shortcut to do it?

In [18]:
print(variable.find('b'))

14


In [19]:
print(variable.isalnum())

False


#### Section 2.5.2 Standard String Methods

Lets look at some standard string **methods** (a method is 'attached' to an object):

* upper: change to upper case
* lower: change to lower case
* title (capitalize): change to title case
* find: return index of first instance of input
* isalnum: is this string alphanumeric?
* isalpha: is this string just letters?
* replace: find all instances of something and change to something else
* strip: remove whitespace characters from a string (useful when reading in from a file)

The period [.] is used to link the object to the **method**. So if we have a string object:

"This is an object"

And we attach the 'upper' method like so:

"This is an object".upper()

Note that some methods take **arguments**!

Try it below using ```SomeVar``` from above:

In [20]:
print(SomeVar)
print(SomeVar.upper())
print(SomeVar.lower())
print(SomeVar.title())
print(SomeVar.find('i'))
print(SomeVar.isalnum())
print(SomeVar.isalpha())
print(SomeVar.replace(' is ',' is not ').replace('a string', 'a banana')) #we can 'chain' methods together!
print(SomeVar.strip(' '))

This is a string. It has been assigned to a variable called 'SomeVar'
THIS IS A STRING. IT HAS BEEN ASSIGNED TO A VARIABLE CALLED 'SOMEVAR'
this is a string. it has been assigned to a variable called 'somevar'
This Is A String. It Has Been Assigned To A Variable Called 'Somevar'
2
False
False
This is not a banana. It has been assigned to a variable called 'SomeVar'
This is a string. It has been assigned to a variable called 'SomeVar'


We can also get help on specific methods using a syntax such as ```help(somevar.title)``` (similar to Stata). We can also get a list of all methods associated with an object using ```dir(object)```. To determine the *type* of object, we can utilize ```type(object)```, and to get detailed help on any object or method ```help(object)```. Lets try a couple out:

In [21]:
type(SomeVar)

str

In [22]:
dir(SomeVar)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'stri

In [23]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

### Section 2.6: Special Characters

What if we need to print a quotation character inside a string that uses quotes? Introducing the **escape character**! The escape character is the backslash: in order to print a quotation rather than use it to end the string, you would type:
```python
"Escaping a \" in a string"
```
However, sometimes you can sidestep this by using a different quotation type within the string itself:
```python
"This will 'work'."
'This will also "work".'
'''This will work for both " and ' types.'''
```
The triple quote is used for block quotes, where you can just keep writing across lines. Lets try it out, and escape the escape character also (with a newline thrown in for good measure):

In [24]:
print('This will also "work".')
print("If you haven't inserted \\ characters\nThis will be \"totally\" broken")

This will also "work".
If you haven't inserted \ characters
This will be "totally" broken


### Section 2.7: Combining strings

Imagine you have two words that you wish to add together, such as 'Data' and 'Science'. There are several ways to do this.

#### Section 2.7.1: Concatenation

In python the + symbol means **concatenate** when it appears between two strings. This is the simplest way to combine two strings. Here are some ways to concatenate. Try them out, but dont forget the whitespace if necessary:

In [25]:
var1 = "Steak"
var2 = " or. "
var3 = "Fish?"
print(var1 + var2 + var3)
print(type(var1+var2+var3))

Steak or. Fish?
<class 'str'>


Note that the + symbol is used for both addition *and* concatenation. So be careful, if you mix strings and numbers python will throw a **traceback** (try ```print(1 + '2')```...: what do you get?)

To make a number into a string, you can use the string function (```str()```)

In [26]:
num = 123
strNum = str(num)
print(strNum)

123


#### Section 2.7.2: Insertion

Sometimes you want to insert something in the middle of a statement but don't want to merely concatenate. Maybe you have a collection of things and want to insert them in a lot of places. The bonus is that you can also print digits really nicely this way.

In [27]:
print("Pi to two decimal points is %1.2f. Isn't that convenient?" % 3.1456)

Pi to two decimal points is 3.15. Isn't that convenient?


Another way to do this is by using 'f-strings':

In [28]:
print(f"Pi to two decimal points is {3.1456}. Isn't that convenient?")

Pi to two decimal points is 3.1456. Isn't that convenient?


#### Section 2.7.3: Joining

Sometimes you want to join strings together with a specific seperator:

In [29]:
";".join(["I want to"," join this together"])

'I want to; join this together'

More commonly, you want to join a list of words on whitespace to make a sentence: ```' '.join(list)```. We'll learn a lot more about lists in the future.

#### Section 2.7.4: Splitting

If you can join strings together, you can also split them (into a **list**: see Section 3.1 below -- this will be our first **collection**)! This is crucial for data cleaning, especially with _free text_ like social media data. The default way to split the data is using the whitespace character, but we can also split on specific substrings:

In [30]:
BigChunk = 'Lets split this into chunks'
print(BigChunk.split(' '))
print(type(BigChunk.split(' ')))

['Lets', 'split', 'this', 'into', 'chunks']
<class 'list'>


### Section 2.8.: Your turn!

Write two strings. One should be about your favourite food, and one should be about your favourite drink. Join them together in any way that you like.

## Section 3: Collections

Virtually every programming language has a notion of a collection. A collection is a means for referring to one or more things at the same time, and Python has many collection types (note: a string can be thought of as a joined up list of characters). In general, collections are **iterable**, which means that you can ask for each item in the collection one-by-one. But beyond that they vary quite dramatically. Here are the major collection *types* that you will come across in Python:

### Section 3.1: Lists

A list is a sequential (the order is relevant), zero-indexed (first item is indexed at 0, just as with strings) and *mutable* (you can add or delete elements) collection signified by ```[... , ...]``` (we saw this above with the split). Lets make and play around with a list:

In [31]:
my_research_interests = ["computer science", "economics", "social science"]
print('My first and third favourite interests are: ' +
      my_research_interests[0] + # Note how we're breaking the lines here!
      ' and ' +
      my_research_interests[2])

My first and third favourite interests are: computer science and social science


Importantly, we can also append onto lists:

In [32]:
my_research_interests.append("data science")
print(my_research_interests)

['computer science', 'economics', 'social science', 'data science']


### Section 3.2: Tuples

A tuple is a sequential, zero-indexed and *immutable* collection signified by ```(... , ...)```: a list you can't change. It's denoted by parentheses rather than square brackets. They are used in lots of places where you don't want a list to change size or you want your object operations to be faster than with a list:

```python
my_research_interests = ("computer science", "economics", "social science")
```

Can you see the similarity with a list? What is different?

#### Section 3.1.2: Querying and Slicing Lists/Tuples 

You can index a list just like a string, and just like strings, you can ask for a range of values (a 'slice') using a colon (although if you run out of range, you will get an error):

```python
my_research_interests[0:2]
```

Note here that the return is a list. If we want a specific string, we can index the new list:

```python
my_research_interests[0:2][0]
```
Can you figure out how to get a specific character from that string, all in one operation?

You can also index from the end of the list/tuple/string to 'walk backwards'. This is done with negative numbers:

```python
my_favourite_sociology_subdisciplines[-1]
```

### 3.3: Dictionaries

A dictionary is an unordered, key-indexed and mutable collection signified by ```{... : ... , ... : ... }```. Like in English, where a dictionary defines a word, a dictionary in Python uses a key to fetch a value. Here, the term we use is `key-value' pairs:

```python
FamousSociologists = {"Marx":"1","Weber":"2", "Durkheim":"3"}
```

In [33]:
FamousSociologists = {"Marx":"1", "Weber":"2", "Durkheim":"3"}
FamousSociologists['C. Wright Mills']="4" #add a new key:value pair 'on the fly'

print(FamousSociologists.keys())
print(FamousSociologists.values())
print(FamousSociologists.items())
print(FamousSociologists['Weber'])

dict_keys(['Marx', 'Weber', 'Durkheim', 'C. Wright Mills'])
dict_values(['1', '2', '3', '4'])
dict_items([('Marx', '1'), ('Weber', '2'), ('Durkheim', '3'), ('C. Wright Mills', '4')])
2


Lets set up a dictionary of food names and types to show that we can **nest** collections:

In [34]:
food = {
    "Tofu": "Delicious",
    "Fruit": "Healthy",
    "Water": ["Healthy", "Not delicious"],
}
food["Water"]

['Healthy', 'Not delicious']

Note: this principle is largely how the .json data format works. JSON (JavaScript Object Notation) is a lightweight data-interchange format. It uses key-value pairs to represent data, organized in a readable and easy-to-parse format. JSON supports nested structures, arrays, and various data types like strings, numbers, booleans, and null, making it widely used for transmitting and storing structured data.

### Section 3.4: Advanced List Operations

#### 3.4.1. Slicing of lists

We can do more than simply query a list by its index. And also, indices can be negative numbers as well. When we use negative numbers we are indexing the list from the end, rather than the front (we briefly saw this last week). We can also ask for a part of a list in a range. This is called 'slicing'. Finally, if we are working with characters, we can chop up a string into a list, or take a list and join it together as a string. You can index a list using the []. To slice a list, you would use the : inside the []. Lets try an example where we get a single indexed return and a slice:

In [35]:
mylist = ['sociology', 'economics', 'political science', 'social policy']
print(mylist[2:4])

['political science', 'social policy']


What happens if we try and call an index not in the range of the list? i.e. ```print(mylist[6])``` ?

Now you try: define your own list (it can be as long as you like) and index and slice it in various ways. Note, we index and split strings in exactly the same way as we do lists:

In [36]:
FoodString = 'Tofu is the most delicious food.'
print(FoodString[:4])
print(FoodString[17:26])
print(FoodString[12:26])

Tofu
delicious
most delicious


We can also find the position at which something occurs with the ```find``` method:

In [37]:
FoodString.find('Tofu')

0

#### 3.4.1. Splitting lists

Strings are just a special kind of list that only includes characters. We can query and slice strings the way we do lists. We can also alternate between strings and lists using ```.split()``` and ```.joint()```, i.e.:

In [38]:
oldstring = 'History repeats itself,\n' + \
            'first as tragedy,\n' + \
            'second as farce.'
newlist = oldstring.split(' ')
print(newlist)
newstring = ' '.join(newlist)
print('\n\n' + newstring)

['History', 'repeats', 'itself,\nfirst', 'as', 'tragedy,\nsecond', 'as', 'farce.']


History repeats itself,
first as tragedy,
second as farce.


Note again how we are breaking lines (Pep-8). We can also re-join a split string!

Here (below) we are splitting the string on the '.', and re-joining them with ' '.

In [39]:
iwanttobreakfree = "I.want.to.break.free."
print(iwanttobreakfree)
godknows=iwanttobreakfree.split('.')
print(godknows)
godknowsiwanttobreakfree=" ".join(godknows)
print(godknowsiwanttobreakfree)

I.want.to.break.free.
['I', 'want', 'to', 'break', 'free', '']
I want to break free 


[Note: why are these variable names 'bad'?]

What happens if we don't respect the case?

```python
FoodString.find('tofu')
```

### Section 3.5.: Your turn!

Build a dictionary which contains -- as keys -- your favourite foods. As values, put an ordinal ranking of them. Appreciate the object types that you're working with!

## Optional Homework:

What is a set? Is it ordered, mutable, etc? What benefits does it provide over a list, tuple or dictionary? What about if there are duplicates in the set? Sets will also feature at the start of (non-optional) Homework Two...

What code within this notebook complies with [pep-8](https://www.python.org/dev/peps/pep-0008/)? What (deliberately...) does not?

How is this different to the statistical software that you are more conventionally used to?


## Non-Optional Homework!

See Homework_One.ipynb in the 'Homeworks' section of the course materials! We will run our randomiser for the first time at the start of the following class.