# -1. Disclaimer
-- Many of the materials in this NoteBook are gently stolen from the following courses: 
- **["A Python Course for the Humanities"](https://github.com/fbkarsdorp/python-course)** a course designed by Folgert Karsdorp and Maarten van Gompel
- and later modified by Mike Kestemont and Lars Wieneke for the course **["Programming for Linguistics and Literature"](https://github.com/mikekestemont/prog1617)**
- **["Python for text analysis"](https://github.com/cltl/python-for-text-analysis)** designed by H.D. van der Vliet and taught at the Vrije Universiteit
- **["How to Think Like a Computer Scientist"](http://www.greenteapress.com/thinkpython/thinkCSpy.pdf)** by Allen Downey, Jeffrey Elkner, Chris Meyers

If things remain unclear, please go though the [NLTK Book](https://www.nltk.org/book/), Chapter 1 ["Language Processing and Python"]( https://www.nltk.org/book/ch01.html)

# 0. Before we kick off: Installing Jupyter Notebook

- Download Anaconda: https://www.anaconda.com/download
        Select the Python 3.6 Version
        Follow the installation instructions
- Download the Notebook and data [here](https://github.com/kasparvonbeelen/Python-Slow-Learning)
        Open Anaconda Navigator
        Launch Jupyter Notebook
        This should open a tab in your browser
        Go to the location where you cloned/unzipped the material downloaded from Github

# 1. Philosophy and Goals of the Course

- You can only learn to code by **doing**. Therefore this course consists of a series of **workshops** that teach you to tackle data-related problems of increasing complexity.

- The focus is on processing and manipulating **social media data**: Coding the Humanities approaches Media Studies from the perspective of **data science**.
- You won't necessarily become a professional coder after this course--this requires more time and effort. You will, however, learn to apply coding techniques to **specific research scenarios** relevant for Media Studies. This course will lay the groundwork for those who are interested in becoming Digital Humanities specialists.

- Coding is **not** difficult, but obtaining basic programming skills requires a **sustained effort**.
- With only a few basic skills you can go a long way (writing scripts vs. developing tools).
- The full course, with all the details, is available [here](https://github.com/kasparvonbeelen/CTH2019) (but still under construction).
- It takes a while before you can do some more fancy stuff (you have to go through kindergarten again before you become a rocket scientist).

#### Exercise

Open this Notebook in Anaconda
- Download the Materials from: https://github.com/kasparvonbeelen/CTH2019 or from Canvas
- In case you retrieved the material via GitHub: Unzip the downloaded folder (you can unzip the materials anywhere, but remember the location)
- Open the Anaconda Navigator, click on Launch Jupyter Notebook
- This should open a Navigator window in your browser: go to the location where you unzipped (or saved the materials) open the file with the name "Lecture 1 Python Basics Part 1" and ends with "ipynb".

#### Difference with the Lab sessions of Information and the Digital? 
- Today, in the first lesson, we look at some of the topics covered in the Lab sessions, but in the subsequent lectures, we take a different route.
- This course is more focused on Social Media research; gives less attention to the fundamentals of coding in Python.
- The lectures investigate language and behaviour on Social Media. We will mainly use [Pandas](https://pandas.pydata.org/), a Python Data Science library.
- The course is mainly based on Jake van der Plas's [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/). The Notebook contain a simplified version of this book that ignores the technical aspects of the Python syntax. However, I suggest you go at least through Chapter 3: [Data Manipulation with Pandas](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html).

### 1.1 The Data of Choice: YouTube and Facebook

As said: this course focuses on working with Social Media data--especially on the content analysis of these data. Our data are sourced from [Social Media Platforms YouTube and Facebook](https://wiki.digitalmethods.net/). We rely on [tools](https://wiki.digitalmethods.net/Dmi/ToolDatabase) created by the Digital Method Initiative. More specifically:
- [YouTube tools](https://tools.digitalmethods.net/netvizz/youtube/)
- [Netvizz](https://apps.facebook.com/107036545989762/)

Run the lines below (type ctrl + enter) to inspect an [example](https://raw.githubusercontent.com/kasparvonbeelen/CTH2019/master/data/videoinfo_-OLEyOYC6P4_2018_12_18-09_26_24_comments.tab) dataset retrieved from YouTube. Don't worry about the details now, the syntax will be made clear in the next lectures.

In [3]:
import pandas as pd

In [4]:
df = pd.read_csv('https://raw.githubusercontent.com/kasparvonbeelen/CTH2019/master/data/videoinfo_-OLEyOYC6P4_2018_12_18-09_26_24_comments.tab',sep='\t')
df.head(10)

Unnamed: 0,id,replyCount,likeCount,publishedAt,authorName,text,authorChannelId,authorChannelUrl,isReply,isReplyTo,isReplyToName
0,Ugyus_zVValJXstM7O54AaABAg,0.0,0,2018-12-15 00:50:49,Good Morning Sunshine,zwar schon millionenfach gesagt aber was das F...,UCZ7eIUG1K8YlPuYFSsz2gKg,http://www.youtube.com/channel/UCZ7eIUG1K8YlPu...,0,,
1,UgwDGL7eJtJZcbNmkpB4AaABAg,0.0,0,2018-12-15 00:30:52,Der Frosch,"<a href=""https://www.youtube.com/watch?v=-OLEy...",UCqULQh3p1rcBeCoRTN4p8ng,http://www.youtube.com/channel/UCqULQh3p1rcBeC...,0,,
2,UgyBYuzokRtN8GH4ukh4AaABAg,0.0,0,2018-12-14 18:35:05,LOS CAMPADRES VARRIO 3 HAWTHORNE L.A,das war noch deutschland......echtes deutschla...,UC6SXCcMEMp5O84HcP4hdl5Q,http://www.youtube.com/channel/UC6SXCcMEMp5O84...,0,,
3,UgwW29yOMzrtt2kiJ0x4AaABAg,0.0,0,2018-12-07 13:29:29,Krillin 1993,Die Leute waren noch in den 80ern und 90ern vi...,UCdIxOAItNdKFAu9Prj5r7Sg,http://www.youtube.com/channel/UCdIxOAItNdKFAu...,0,,
4,UgzlW47YdDEmy9U1pSZ4AaABAg,0.0,0,2018-12-01 19:31:44,Logan Stroganoff,Love from a Florida kid that got introduced to...,UC9Ewr5Qf4fRqpQJZTISVYOg,http://www.youtube.com/channel/UC9Ewr5Qf4fRqpQ...,0,,
5,UgxYETfJj4xXohlsH454AaABAg,0.0,0,2018-11-30 18:40:32,lazarus wtf,techno ist eine kunst und avicii war ein sehr ...,UCyg3vy24eYE5UGTbrdp7akg,http://www.youtube.com/channel/UCyg3vy24eYE5UG...,0,,
6,UgyQF3mrx1XKp0UPYnh4AaABAg,0.0,0,2018-11-26 12:39:23,Lincoln Project DJ SBSW,was für eine goile Zeit! ;-),UCgvfb9p-0E7V5RnGlV1xE8w,http://www.youtube.com/channel/UCgvfb9p-0E7V5R...,0,,
7,UgwvNro4_PIkr0ksc9x4AaABAg,0.0,0,2018-11-26 02:40:10,MAKAROW,Berghain,UCVDVBRCYPpntvI35WQ6CMww,http://www.youtube.com/channel/UCVDVBRCYPpntvI...,0,,
8,UgxJgaRNzcdrL8BXOzt4AaABAg,0.0,0,2018-11-23 19:33:07,nouse44,geiler pc hater :D,UCXxr5x_m7LVgwnp3_ezTYYQ,http://www.youtube.com/channel/UCXxr5x_m7LVgwn...,0,,
9,Ugxmk0vG2HOrY89xDwp4AaABAg,0.0,0,2018-11-23 13:20:44,Made in West-Germany,10 werbeunterbechungen - nein danke. Ich schau...,UCcMPlZNIPyi9iWxd1QwwRbg,http://www.youtube.com/channel/UCcMPlZNIPyi9iW...,0,,


In [None]:
#### --Question--

What information do the columns contains?

### 1.2 The Language of Choice: Python

#### **What** is Python?

[From Wikipedia](https://en.wikipedia.org/wiki/Python_(programming_language): Python is a widely used **high-level** programming language for **general-purpose** programming.
- ** high-level programming language**: In computer science, a high-level programming language is a programming language with **strong abstraction from the details of the computer**. In comparison to low-level programming languages, it may use **natural language elements**, be easier to use, or may **automate** (or even **hide** entirely) significant areas of computing systems (e.g. memory management), making the process of developing a program simpler and more understandable relative to a lower-level language. The amount of **abstraction** provided defines how "high-level" a programming language is.


#### **Why** Python?

In general, Python is **easier to learn and to read**. Let's look at a very simple example. 

In [None]:
print('Hello, World.')

Compare this to the C++ version of  "Hello, World." which looks like this:

C++ code below:
``
#include <iostream.h>

void main()

{
    
    cout << "Hello, world." << endl;

}

``

End of C++ code.


So, in general, the reasons why I teach **Python** are:

- Software **Quality**: Python code is designed to be **readable**, and hence reusable and maintainable. 
- Developer **Productivity**: Python code is typically one-third to one-fifth the size of C++ or Java code. 
- **Portability**: Python code runs unchanged on all major computer platforms (Windows, Linux, MacOS). 
- **General-purpose**: data analysis, web development etc.
- **Support Libraries**: Standard, homegrown and third-party libraries.
- **Widely used by the academic and scientific community!**

# 2. Goal of Today's Lecture

Today we cover a few basic **Python data types and objects**
- Variable Assignment
- String formatting and working with text
- Indexing and slicing

This course is an **overview** of the Python language. You don't have to learn everything by heart, but treat the information in the Notebook as a **reference** for your later projects. I suggest going through this Notebook multiple times, it will lay a solid foundation for what is to come. 

## 2.1 What's next?

- 09/01: Working with structured data: Exploring Pandas DataFrames;
- 14/01: Combining datasets, aggregation and comparison of information;
- 16/01: Semi-structured information: Content Analysis with Pandas and AntConc;
- 21/01: TBD (Classification and Topic Modelling);
- 23/01: No classes (time to work in the final assignment);
- 28/01: Walk-in Clinic (personal feedback help for projects);
- 31/01: Final Assignment Due

## Intermezzo Using the Notebook Environment

# 3. Baby Python

For practising your coding skills, you can use the many **'code blocks'** in this Notebook, such as the grey cell below. Place your cursor inside the cell and press ``ctrl+enter`` to "run" or execute the code. Let's begin right away: run your first little program!

In [5]:
print('Hello, World!')

Hello, World!


You've just executed your first program!

#### --Exercise--
- Can you describe what the programme just did?
- Can you adapt it to print your name (with a greeting, i.e. "Hello, ...")?

Use the code block **below**.

In [21]:
# Insert your own code here!
# Print your own name ... or whatever you want, and press ctrl + enter
print ('Hello, Yifan')

Hello, Yifan


Besides printing words to your screen, you can use Python as a **calculator**. 

In [6]:
print(10)
print(5+9)
print(3*8)

10
14
24


> Please note that a string is always enclosed in **quotation** marks (which be single *`'`* or double *`"`*), while a number (integer or float) is not.

#### --Exercise--

- print the number 5 as a string (i.e. with quotation marks)
- print the number 5 as an integer (i.e. without quotation marks)


In [20]:
# Insert code here
print ('"5"')
print (5)

"5"
5


#### --Exercise--
Use the code block below to calculate (and print) how many minutes there are in one week?

**HINT**: use the multiplication operator **`*`** (i.e `5*4*4`)

In [7]:
# Insert code here
print (5*4*4)

80


How many minutes have passed since your birth? (Approximately of course, just use your age (for example: how many  minutes are there in 21 years))

In [8]:
# Insert code here
print (60*24*21)

30240


# 4. Variables: Presents for Everyone

One of the most powerful features of a programming language is the ability to **store and manipulate variables**. A variable is a **name** that refers to a value. The **assignment statement** creates new variables and relates them to concrete values. Instead of passing these elements as an argument to the `print()` function, we can **store** them, by creating a variable that refers to the "Hello, World!" string.

In [9]:
# declare a variable
x = 'Hello World.'
# print what is in the box
print(x)

Hello World.


In [10]:
# declare a variable
y = 22
# print what is in the box
print(y)

22


If you vaguely remember your math-classes in school, this should look familiar. It is basically the same notation with the name of **the variable on the left, the value on the right**, and the = sign in the middle. 

In the code block above, two things happen. **First**, we fill `x` with a value, in our case `22`. This variable x behaves pretty much like a **box** on which we write an `x` with a thick, black marker to find it back later. **Second**: We print the contents of this box, using the `print()` command. <img src="https://github.com/kasparvonbeelen/CTH2019/raw/master/images/box.png">

You can inspect the type of the variable with the `type()` **function**. You can use this function by putting the object between parenthesis.

In [12]:
text = 'Hello, Worlds!'
print(type(text))
number = 10 
print(type(number))
number_string = '10'
print(type(number_string))

<class 'str'>
<class 'int'>
<class 'str'>


#### --Exercise--
Create and print two variables, one containing your name (string) and another on your year of birth (integer)

In [16]:
# Write your code here
name = 'Yifan'
birth = '06041997'
print (name + " is born on " + birth)

Yifan is born on 06041997


# 5. Strings: How Python Understands Text

In the preceding sections, we learned how to define string variables.

In [13]:
x = 'Yo, You'
print(x)
print(type(x))

Yo, You
<class 'str'>


Let's have a closer look at the ``'str'`` type (str stands for string)

Similar to numbers, strings can also be added together. What do you think the operation below will produce? (pause a moment before running the code.)

In [14]:
first_name = "Kaspar"
last_name = "Beelen"
print(first_name+last_name)

KasparBeelen


This the last operation is called string **concatination**. We added one string to another using the `+` operator.

In [15]:
book = "The Lord of the Flies"
print(first_name + " likes " + book + "?")

Kaspar likes The Lord of the Flies?


#### --Exercise--
Declare two variables `first_name` and `last_name`. Print them neatly using concatenation (with a space in between).

In [13]:
# Insert code here
first_name = 'Yifan'
last_name = 'Feng'
print (first_name + " " + last_name)

Yifan Feng


## 5.1 Variables and Strings

In [16]:
song = "Naturkatastrophenkonzert"
print(song)

Naturkatastrophenkonzert


As we know by now, such a piece of text ("Naturkatastrophenkonzert") is called a ``string`` in Python (cf. a **string (or sequence) of characters**). Strings in Python must always be enclosed with 'quotes' (either **single** or **double** quotes). 
> *Without quotes, Python will think it's dealing with the name of some variable that has been defined earlier because variable names never take quotes.* 

The following distinction is confusing, but extremely important (for this reason I repeat it here): variable names (without quotes) and string values (with quotes) look similar, but they serve a completely different purpose. Compare:

In [18]:
print('Hello')

Hello


This works fine.

In [17]:
print(Hello)

NameError: name 'Hello' is not defined

Ooops... this raises an error!

#### --Exercise--

Solve this error by using variable assignment (assign a variable with the name Hello to a string).

In [19]:
# Insert code here
Hello = "Hello"
print(Hello)

Hello


#### --Question--

The distinction between strings and variables is crucial. In the cell below, try to understand what the code will print, then remove the hashtags to uncomment the code.

In [20]:
name = "Doris"
Doris = "name"
D = "D"
print(name)
print (Doris)
print(D)

Doris
name
D


## 5.2 Indexing and Slicing

Now that you know the difference between variables and string values, we can inspect these strings further. Strings are called strings because they consist of a **series** (or ``'string'``) of **individual** characters. We can access these characters in Python with the help of **``'indexing'``** because each character in a string has a unique **``'index'``** (i.e. an integer that points to the position of the character). To print the first letter of the variable `song`, you can type:

In [25]:
song_startswith = song[0]
print(song)
print(song_startswith)

Naturkatastrophenkonzert
N


### How does indexing work exactly?

![Indexes of the string Monty Python starting with 0](https://i.stack.imgur.com/vIKaD.png)
Take a look at the string "Monty Python". We use the index **`0`** to access the **first** character in the string. This might seem odd, but all indexes in Python start at **zero**. Whenever you count in Python, you start at `0` instead of `1`. Note that the **space character** gets an index too, namely 5. This is something you will have to get used to!

How to access the last letter of "Naturkatastrophenkonzert"? Python has a `len()` function, which tells you how many elements the sequence contains

The example below demonstrates how to access the last item. Do you understand the following statement?

In [26]:
# Exercise print the last letter of your name using negative index
last_letter = song[-1]
print(last_letter)

t


#### --Exercise--

Print the first and list character of your name using index notation.

In [27]:
# Insert code here
name = "YIFANFENG"
first_letter = name[0]
last_letter = name[-1]
print (first_letter)
print (last_letter)

Y
G


Now can you write some code that defines a variable `but_last_letter` and assigns to this variable the *one but last* letter of your name?

In [7]:
name = 'Yifan'# enter your name as a string

In [10]:
# Insert code here
but_last_letter = name[-2:-1]
print (but_last_letter)

a


You're starting to become a real expert in indexing strings. Now, what if we would like to find out what the first two or three letters of our name are? In Python we can use so-called **'slice-indexes' or 'slices'** for short. To find the first two letters of our name we type in:

In [28]:
first_two_letters = name[0:2]
print(first_two_letters)

YI


The `0` index is optional, so we could just as well type in `name[:2]`. This says: take all characters of the  `name` variable until you reach index 2 (i.e. up to the third letter, but not including the third letter). We can also start at index 2 and leave the end index unspecified:

In [29]:
without_first_two_letters = name[2:]
print(without_first_two_letters)

FANFENG


Because we did not specify the end index, Python continues until it reaches the end of our string. If we would like to find out what the last two letters of our name are, we can type in:

In [30]:
last_two_letters = name[-2:]
print(last_two_letters)

NG


### General Slice Syntax

The more general form of the Python slicing syntax has the shape
   
   `object[start:stop:step]`
 


Complete the three exercises below, to check if you properly understand this syntax.

#### --Exercises--

Print your name, but only the characters with and even index

In [68]:
# Insert code here
x = name[0:8:2]
print (x)

YFNE


#### --Exercise--

What happens when you use a negative step without defining the start and end of the slice?

In [43]:
# Insert code here
x = name[ : :-1]
print (x)

GNEFNAFIY


#### --Question--

Write down, in normal "human" language which elements are retrieved by the following slicing operations.

In [46]:
title = "Monty Python"
print(title[2:-1:2])
print(title[::-1])
print(title[:-2:3])
print(title[::])
print (title[::1])

nyPto
nohtyP ytnoM
MtPh
Monty Python
Monty Python


## 5.2 String Methods

Strings and numbers can be thought of as **objects**, "things you can do stuff with". In Python language each object has a set of **methods/functions** attached to it. If objects can be thought of as **nouns**, then methods/functions serve as **verbs**, they are the tools that operate on (do something with) these objects. 

In general the methods (or functions) appear in these forms:
- `function(object,argument)`
- `object.method(arguments)`
    
In the example below, we applied the `len()` function to measure the number of characters in a string; the `.lower()` methods lowercases all characters. 

Both are called **fruitful** functions, as they return something (i.e. a number and a string respectively)

In [47]:
print('HELLO'.lower())
print(len('HELLO'))

hello
5


For sure the methods can also be applied to variables:

In [48]:
word = 'HELLOOOOOO'
print(word.lower())
print(len(word))

helloooooo
10


Python comes with many useful **tools for text processing**. You can list and inspect them with `dir()` or `help()` functions.

In [49]:
book = 'Pride and Prejudice' # Let's pretend we stored a whole book in this variable

`dir()` shows all the methods you can apply to the string variable `book`. Please scroll down. You can ignore the elements starting with double underscores.

In [50]:
dir(book)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

All these methods allows you to do things with strings. Some of the most useful methods are
- `split()`
- `lower()`
- `len()`
- `find()`

## .split()

#### --Exercise--

Inspect the example below, and figure out how the `split()` method works.

In [51]:
print('1,2,3,4,5'.split(','))
print('Hello how are you today?'.split(' '))

['1', '2', '3', '4', '5']
['Hello', 'how', 'are', 'you', 'today?']


`.split()` converts a string of characters to a **list of words** (approximately, we come back to lists later on in this course): it **returns** a list of items seperated by the delimiter (the split character). 

Let's have a closer look at the output of this method in the exercise below. As in the code cells above, we can save the output of a `.split()` in a new variable. 

In [66]:
csv = '1,2,3,4,5'
numbers = csv.split(',')
print(numbers) 


['1', '2', '3', '4', '5']


#### --Exercise--

- `split()` the sentence variable with white space （white space: can be "  " or nothing ( ))
- assign the output to a new variable `words`
- get the last items of the `words` list using index notation.

In [77]:
sentence = "Alice was beginning to get very tired of sitting by her sister on the bank."
# Insert your code here

print(sentence.split())
words = sentence.split() 
print (words[14])
print (words[-1])



.
['Alice', 'was', 'beginning', 'to', 'get', 'very', 'tired', 'of', 'sitting', 'by', 'her', 'sister', 'on', 'the', 'bank.']
bank.
bank.


#### --Exercise--

For more information, print the Python **documentation** on the `.split()` method using the `help` function.

In [76]:
# search for help here
name = "Kaspar"
help(name.split)
# or
help(str.split)

Help on built-in function split:

split(...) method of builtins.str instance
    S.split(sep=None, maxsplit=-1) -> list of strings
    
    Return a list of the words in S, using sep as the
    delimiter string.  If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or is None, any
    whitespace string is a separator and empty strings are
    removed from the result.

Help on method_descriptor:

split(...)
    S.split(sep=None, maxsplit=-1) -> list of strings
    
    Return a list of the words in S, using sep as the
    delimiter string.  If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or is None, any
    whitespace string is a separator and empty strings are
    removed from the result.



## .lower()

#### --Exercise--

Experiment with the `lower()` function. 
- Create a string variable;
- Pass the lowercased variable to another one;
- Print the lowercased and the original variable.

In [87]:
# Experiment with lower
# Declare a string variable with capitals
name = 'YIFAN FENG'
print (name.lower())


# Look for documentation on `lower`

type (name)
help (str.lower)


# Apply lower to the variable AND assign the lowercased string to a new variable

Name = name.lower()
print (Name)

# print the variables before and after applying the lower method



yifan feng
Help on method_descriptor:

lower(...)
    S.lower() -> str
    
    Return a copy of the string S converted to lowercase.

yifan feng


#### --Exercise--

- Lowercase the sentence
- Split by the character `a`

In [89]:
sentence = "Alice was beginning to get very tired of sitting by her sister on the bank."
# Insert code here
print (sentence.lower().split('a'))

['', 'lice w', 's beginning to get very tired of sitting by her sister on the b', 'nk.']


## .find()

Run the cell below to understand what the `.find()` method does.

In [90]:
help(str.find)

Help on method_descriptor:

find(...)
    S.find(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.



#### --Exercise--

Find the position of the first 'e' in the title "Naturkatastrophenkonzert".

In [91]:
title = 'Naturkatastrophenkonzert'
# use the find() method here
title.find('e')

15

#### --Exercise--

The code cell below downloads [Romeo and Juliet](http://www.gutenberg.org/cache/epub/1777/pg1777.txt) from the Gutenberg Project.

In [92]:
import requests # run but ignore these lines
randj = requests.get('http://www.gutenberg.org/cache/epub/1777/pg1777.txt').text

`randj` contains the full text of Romeo and Juliet, you can inspect the variable by printing the first hundred characters.

#### --Exercise--

Print the first hundred character of Romeo and Juliet.

In [93]:
# Insert code here
print (randj[0:99])

﻿
This Etext file is presented by Project Gutenberg, in
cooperation with World Library, Inc., fro


Find the **first** occurence of the word **`love`** in Shakespeare's Rome and Juliet. 

**HINT**: Do not forget to first lowercase all words!

In [102]:
# Insert code here
X = randj.lower().find('love')
print (X)

10755


You can print the context around `first_love` using the [index](https://www.oreilly.com/learning/how-do-i-use-the-slice-notation-in-python) notation. (Please follow link for more information.)

In [109]:
first_love = 10759+166
context_size = 50 # the number of character around the word
start_at = first_love-context_size # indicate the starting position
stop_at = first_love+context_size+len('love') # indicate where to stop
print('Start printing at character with position=',start_at)
print('Stop printing at character with position=',stop_at)
print('\n')
print(randj[start_at:stop_at]) # print with context

Start printing at character with position= 10875
Stop printing at character with position= 10979


e.
    The fearful passage of their death-mark'd love,
    And the continuance of their parents' rage,


#### \*\*\*--Exercise--

- Can you find the **second** occurence of **"love"** in this play? And print the context?
- Can you print the second occurence of love with 50 characters context?

HINT: Inspect the `help()` function. Reuse information from the above code cells (`first_love`).
HINT II: Use slicing to print the local context of a word.

In [110]:
# add and copy-paste your code here
X = randj.lower()
print (X[10759:].find('love'))

166


# Recap

- Variables are boxes in which you can store information.
- Variables can be of a different type: Text (strings) or Numbers (Integers).
- Methods/Function allow you to manipulate the content of these boxes (e.g. `.lower()`)

In [3]:
# Experiment a bit here
X = 'Yifan Feng'
A = '1'
X.lower()
print (A)

1


## len()

`len()` counts the number of elements the argument contains. If you pass a string as an argument, it counts the number characters.

Note: the syntax is slighly different here (for reasons that fall outside the scope of this course.)

In [111]:
word = 'supercalifragilisticexpialidocious'
print(len(word))

34


In [113]:
# How many characters does your full name contain?
name = 'yifanfeng'
print (len(name))

9


#### --Exercise--

How many character and words does Romeo and Juliet contain (approximately)? 
> HINT: Use `split()` and `len()` in combination.

In [114]:
# download Romeo and Juliet from Gutenberg
import requests
randj = requests.get('http://www.gutenberg.org/cache/epub/1777/pg1777.txt').text
# add your code here
print(len(randj.split()))

27689


### --Exercise--

Can you find other useful string methods?

In [None]:
# if yes, play with them here!
upper(); lower();find(); 
#https://www.w3schools.com/python/python_ref_string.asp

## We are DONE for today. Congratulations!