# Python from ground zero workshop

Welcome to Python workshop. In this notebook, we'll go step by step to build a solid base for Python. 

# Part 1 - Learning Colab! 

In this part we'll learn the Colab notebook interface.

Colab, or "Colaboratory", allows you to write and execute Python in your browser. We write codes in snippets called "cells", execute them and see what they do just below.

Let's get a hang of it, shall we?

Type in the following box: `42` 
Then press the ▶️ button on the left

We entered an expression, the simplest expression there can be: a number and the Python compiler interpreted it and gave us a result. 

Since there's nothing to be interpreted by a number it just gives whatever we give to it: `42`

*(Secret to programming: Programs do whatever us humans tell them to do)*

Now type: `42 + 90` (or any other mathematical expression)

Now, python compiler interpreted the mathematical expression and gave us its result.

Try entering a text now: `'I love Python!'`

*Hint: You can also use the short cut Shift+Enter to execute a cell quickly!*

Now let's get fancy. Enter the command: `print("Hello World!")`

You can write as many lines as you want. Although unless you explicitly do a print, only the latest expression will show up in the result. Let's try below

Now let's give ourselves a high five for writing our first script! 

# Part 2 - Variables

Variables are where we keep information in programming. It's a way of asking the computer to allocate a space for us to store things. 

To create a variable, all we need to do is to put a name and a value to it. It is as simple as writing an equation:

`mybrandnewvariable = 5`

Once it is defined like this, the variable name will be our key into reaching this variable. 

The value of a variable is what's kept INSIDE the memory. You can think of variable name as the key of a box and the value as what's inside the box. 

Variable names are static but we can go changing the value however we want. So if I modify it in a certain point, it'll keep the updated value. 

`mybrandnewvariable = 9`

Let's do a print using a variable. 

Here, we stored the text `Hello world!` inside a variable called `my_string_variable`. 

Then, we printed referencing that variable. 

Note that we specified the text within quotation marks `"..."` but not when we reference the variable. 

Let's see the difference

You see Python interprets whatever inside quotation marks as textual information. Since we used quotation marks here, Python didn't look for a variable named `my_string_variable`. Instead, it just interpreted it as regular text. 

Although we are free to write text any way we want, there are some naming conventions behind variables:
- The variable name should begin with a letter or an underscore.
- The variable name can ONLY consist of letters, numbers and underscore (NO SPACES)
- Variable names ARE case sensitive (`my_variable` and `MY_Variable` would be two different variables)


## Variable types

There are various types of variables. For example:

- number variable: `a = 5`
- text/string variable: `v = "some text"`
- lists: `l = [1,2,3,4]`
- boolean: `b = True`

We can see what type a variable has by using `type`. 

`str` stands for string. That's what programmers call a sequence of characters. That is any textual data. 

Also, have you noted that we referenced a variable that we defined in another cell? 

The Python instance is running continuously while we're working so it'll remember whatever variable we define. 

## Numbers

A whole number can be stored in an `int` variable standing for `integer`. 

A decimal number can be stored in a `float`, standing for floating point value.

Execute the following cell and see different types of numbers

In [None]:
num1 = 42
num2 = 5.111
print("Type of num1 is", type(num1))
print("Type of num2 is", type(num2))

# Challenge 1 - Variable arithmetics

Define two number variables with values `9` and `12` and then get their sum, difference, division and product using the following variables:

| Symbol | Task Performed |
|----|---|
| +  | Addition |
| -  | Subtraction |
| /  | division |
| %  | mod |
| *  | multiplication |
| //  | floor division |
| **  | to the power of |

# Part 3 - String operations

Many times we need to manipulate the text we're working with. There are plenty of operations in Python we can use for our tasks.

## String concatentation 

You can concatenate (merge) two strings with `+` operator. 

Try executing the following expression: `"Hello" + "World"`

Oops, forgot a space there. Add it yourself in the cell below:

## Indexing

We said earlier that strings are a sequence of characters. Python allows us to get the character in the position we want using indices (`[x]`).

Try executing the code below: 

In [None]:
mystr = "localization"
mystr[5]

`mystr[5]` gave us the fifth character in the string `localization` which is `l`... wait a second

We didn't get `l` but instead `i`. Why?

Well, it's because the indices in Python (and many other programming languages) start from 0. 

So... `0`-`l`, `1`-`o`, `2`-`c`, `3`-`a`, `4`-`l`, `5`-`i`... we got `i`!

## String length

You can get the length of a string using `len(str)` function.

Try getting the length of `mystr` below:

# Challenge 2 - String arithmetics #1

Obtain the character in the middle of `localization` using the following tools:
- `len(str)` for getting the length of a string
- `/` division operator
- `[x]` index operator

You can define new variables if needed.

## `str.replace`

Let's see what `str.replace` does by using `help` 

Now let's use `str.replace` to replace `Mr.` with `Dr.` in the following string

In [None]:
myname = "Mr. Jekyll"
myname.replace("Mr.", "Dr.")

Let's print `myname` now...

In [None]:
print(myname)

It didn't change why? 

It's because the change is not made in-place. If you check the help above, you'll see that it "returns a copy" of the string but not the string itself. 

If we want to store the modified variable we need to set the result into a new variable.

In [None]:
#Defining new variable


It also replaces all occurences of the given string.

In [None]:
"Mr.Jekyll and Mr.Hyde".replace("Mr.", "Dr.")

## `str.format()`

This is used to place variables inside a string. 

In [None]:
secret = '{} is cool!'.format('Python')
print(secret)

In [None]:
#Introduce yourself using format


## `str.join()`

In [None]:
str1 = 'pandas'
str2 = 'numpy'
str3 = 'requests'
cool_python_libs = ', '.join([str1, str2, str3])

In [None]:
print('Some cool python libraries: {}'.format(cool_python_libs))

## `str.upper(), str.lower(), str.title()`

These three are very useful when cleaning text

In [None]:
mystr = 'pyTHoN hackER'

In [None]:
mystr.upper()

In [None]:
mystr.lower()

In [None]:
mystr.title()

In [None]:
mystr.capitalize()

## `str.strip`

Strips out sneaky whitespace at the beginning or end of the string. 

Note that `\t` stands for tab and `\n` stands for a new line in python (For more info see [Escape characters](http://python-reference.readthedocs.io/en/latest/docs/str/escapes.html#escape-characters))

In [None]:
my_crooked_str = "   \t THis Is a \tTeRRIble sTRing \n"

In [None]:
print(my_crooked_str)

In [None]:
slightly_better = my_crooked_str.strip()
print(slightly_better)

We can pipeline operations by adding them one after the other

In [None]:
slightly_better = my_crooked_str.strip().lower()
print(slightly_better)

## `str.split()`

This is used to split string from the delimiters we choose. Let's see what `help` says about it. 

In [None]:
help(str.split)

In [None]:
sentence = 'three different words'
words = sentence.split()
print(words)

As the description says, it returns a list. We'll get into lists later. 

We could use other delimiters too.

In [None]:
another_sentence = 'three, different, words'
words = sentence.split()
print(words)

# Challenge 3 - String arithmetics #2

Let's use what we learned to make this horrible string truly beautiful. 

Let's process `my_crooked_str` so that it looks like this:

`"This is a beautiful string."`

*(Don't forget the period in the end)*

In [None]:
my_beautiful_string = my_crooked_str._______
print(my_beautiful_string)

# Part 3 - Conditionals

Welcome to conditionals, the most fundamental decision mechanism in coding. It's a way of showing direction to our program based on a condition.

But before we get into that, I'll introduce another variable type called boolean (named after [George Boole](https://en.wikipedia.org/wiki/George_Boole)).

A boolean is a binary variable, that is it can either be "on" or "off", 1 or 0, `True` or `False`. 

In [None]:
my_boolean = True
print(my_boolean)
print(type(my_boolean))

I can negate a boolean variable by using the expression `not`

In [None]:
my_boolean = True
another_boolean = not my_boolean
print(another_boolean)

Now let's build our first conditional using `if`. 

In [None]:
statement = True
if statement:
    print('statement is True')
    
if not statement:
    print('statement is not True')

Note the indentation under `if` expression

`if...else` also comes in handy

In [None]:
statement = True
if statement:
    print('statement is True')
else:
    print('statement is not True')

Let's say we want to decide what we want to wear depending on the temperature given in degrees celcius.

To decide based on a number, we'll need the following:

| Symbol | Task Performed |
|----|---|
| >  | Bigger than |
| <  | Smaller than |
| >=  | Bigger than or equal to |
| <=  | Smaller than or equal to |

In [None]:
temperature = 15

#if...else


If we have more than one condition then we use `if...elif...else`

In [None]:
temperature = 12

#if...elif...else


We can combine conditions with `and`, `or`

In [None]:
temperature = 25
sunny = True

#combined if


In [None]:
temperature = 12
rainy = True

#combined if with OR

`==` and `!=` are used to evaluate if two expressions are equal or not equal

In [None]:
language = 'en'

if language == 'fr':
  print("Bonjour!")
elif language == 'en':
  print("Good morning!")
elif language == 'tr':
  print("Günaydın")

In [None]:
password = '12345'
user_input = '12345'

#Warn if password is wrong using !=


#Challenge 4 - Conditionals

Fill in where it's marked with `____` with the proper conditional

In [None]:
name = 'George Boolean'

if ____:
    print('Name "{}" is more than 20 chars long'.format(name))
    length_description = 'long'
elif ____:
    print('Name "{}" is more than 15 chars long'.format(name))
    length_description = 'semi long'
elif ____:
    print('Name "{}" is more than 10 chars long'.format(name))
    length_description = 'semi long'
elif ____:
    print('Name "{}" is 8, 9 or 10 chars long'.format(name))
    length_description = 'semi short'
else:
    print('Name "{}" is a short name'.format(name))
    length_description = 'short'

# Part 4 - Lists and dictionaries

Lists and dictionaries make it super easy to keep a set of values tidy in one single variable. 

## Lists

A list is defined as follows:

`my_list = [1,2,3,4,5,6]`

Let's say if we want to read the third element, we'd just need to say:

`my_list[2]`

*(Note that I put 2 there because remember first element is index 0)*

Try creating a list and print one of it's elements

A list can be empty...

In [None]:
new_list = []

if not new_list:
  print("My list is empty :(")

And if we want to add to and remove elements from a list we already defined, we can use the `append` and `remove` function

In [None]:
#append and remove


print(new_list)

A length of a list can be obtained using the `len` function

In [None]:
print('list: {}, size: {}'.format(new_list, len(new_list)))

We can modify or remove an element too

In [None]:
my_list = [0, 1, 2, 3, 4, 5]
print(my_list)

print("\nModify value at index 0 to 99")
________
print(my_list)

print("\nRemove first value using del")
______
print(my_list)

It's very useful to check if a variable we want is in the list. 

For that we can use `in`

In [None]:
languages = ['en', 'fr', 'tr', 'ar', 'sw']

if 'tr' in languages:
  print("Turkish is in the list")

if 'de' in languages:
  print("German is in the list")
else:
  print("German is not in the list")

if 'ro' not in languages:
  print("Romanian is not in the list")

And finally, we can sort a list using `sort`

In [None]:
numbers = [8, 1, 6, 5, 10]
numbers.sort()
print('sorted numbers:', numbers)

numbers.sort(reverse=True)
print('numbers reverse sorted:', numbers)

words = ['this', 'is', 'a', 'list', 'of', 'words']
words.sort()
print('words:',words)

Merging lists are easy as 1,2,3

In [None]:
old_star_wars = [4,5,6]
new_star_wars = [1,2,3]

all_star_wars = ___________
print(all_star_wars)

# Challenge 5 - Lists

Define an empty list called `challenge`. Add the elements:
- `I`
- `am`
- `learning`
- `Python`
- `today`

Replace the element `learning` with `loving`

Remove the element `today`

Sort the list

In [None]:
#Execute this cell to see if you did everything correctly
assert challenge == ['I', 'Python', 'am', 'loving']

## Dictionaries

Dictionaries are key-value pairs. 

A dictionary is defined with curly brackets

In [None]:
my_dict = {'a':'apple', 'b':'banana', 'c':'cherries'}
print(my_dict)
print(type(my_dict))

We can read the elements using the keys that we defined

cherries


Or modify them

{'a': 'avocado', 'b': 'banana', 'c': 'cherries'}


# Part 5 - Loops

Sometimes there are too many variables that we need to automate their processing. Loops come in handy when we want to perform a repeated task. 

We usually iterate through a list's items. 

The keyword conventionally used in programming is `for`. 

when we say `for item in my_list`, I get the elements from `my_list` one by one into a variable called `item` which is only accessible within that loop step.

Let's do a simple printing example

In [None]:
my_list = [1, 2, 3, 4, 'Python', 'is', 'neat']
#Print items in list

# Challenge 6 - loops and dictionaries

Given our list of fruits, transfer them into a dictionary using their first letters as key 

For example:

Given list `['apple', 'pear', 'cherry']`

Make dictionary `{'a':'apple', 'p':'pear', 'c':'cherry'}`

In [None]:
#Fill in the blanks
fruits = ['cantaloop', 'cherries', 'mango', 'papaya', 'pomegranade']
fruit_dictionary = {}

for ____ in fruits:
  first_letter = _____
  fruit_dictionary[first_letter] = _____

print(fruit_dictionary)

In [None]:
#Execute this to see if you get it right
assert fruit_dictionary == {'c': 'cantaloop', 'p': 'pomegranade', 'm': 'mango'}

# Challenge 6b - List inside dictionary (Home exercise)

As you have noticed, because the keys clashed in the previous exercise, we lost some fruits that were in the original list. 

Edit the code so that we keep all fruits under the same key. 

The final dictionary should look like this:

`fruit_dictionary = {'c': ['cantaloop', 'cherries], 'p': ['papaya', 'pomegranade'], 'm': ['mango']}`

# Part 6- Functions

A function is a replicable procedure to perform a specific task. It's very useful when we're going to perform one task repeatedly or in different parts of our code. 

Functions are declared with the keyword `def` followed by function name and then function parameters

```
def my_function(parameter1, parameter2)
  #Stuff that my function will do
```

Let's write a function that we can use to greet 

In [None]:
#Greet function


Now let's use our name with it by letting it as a parameter

Functions can return values too.

Let's make a function that gives out the square of the number we give to it.

See, when we have a long array and we want to perform our function to all elements we can just create a loop and call our function to do its magic

In [None]:
my_numbers = [1,2,3,4,5,6,7]
my_numbers_squared = []

for...

# Challenge 7 - Searching for wanted people

Implement `find_wanted_people` function which takes a list of names (strings) as argument. The function should return a list of names which are present both in `WANTED_PEOPLE` and in the name list given as argument to the function.

In [None]:
WANTED_PEOPLE = ['John Doe', 'Clint Eastwood', 'Chuck Norris']

In [None]:
def find_wanted_people(people_list):
  #Implement your function here

In [None]:
#Execute this cell to see if you did it right
people_to_check1 = ['Donald Duck', 'Clint Eastwood', 'John Doe', 'Barack Obama']
wanted1 = find_wanted_people(people_to_check1)
assert len(wanted1) == 2
assert 'John Doe' in wanted1
assert 'Clint Eastwood'in wanted1

people_to_check2 = ['Donald Duck', 'Mickey Mouse', 'Zorro', 'Superman', 'Robin Hood']
wanted2 = find_wanted_people(people_to_check2)
assert wanted2 == []

# Part 7 - Packages

What we presented above was the core basics of python. You might be thinking how can these simple tools can help with our complicated tasks. 

There seems to be a huge gap right? 

Well, yes there is but you don't have to worry that much. Another miracle of Python is that it is modular. 

What does modular mean?

It means if I want to perform task X, it is highly probable that someone out there wrote a code for it that I can just plug it in my code. 

In Python, this is standardized through packages. 

Ever heard of the saying *"There's an app for it?"*

Well in python **"There's a package for it!"**

There are thousands of packages that can help you solve tasks like:

- Reading and manipulating Google spreadsheets
- Process TMX, XML etc.
- Clean text from emoji etc.
- Build a machine translation model (more on this on Friday!)
- Whatever you might imagine...

Some of these packages come readily with our Python installation. 

And the rest can be found in sites like https://pypi.org/ and installed with one line of command. 

# Final challenge

Now that we're Python experts, we can come through some real challenges, right??

In this final challenge, we're going to fix some highly challenging documents which a client has asked us to translate. 

You can download them from this link:...

Next, we're going to put them in our Colab workspace using the left menu. 

It is always good to think about the task before embarking on any coding. 

A programmer always thinks in terms of input and output. Let's define them first:

INPUT: 2 docx Word documents with badly formatted text

OUTPUT: 2 docx Word documents with format corrected text

Once we know the beginning and ending of our procedure, we can start thinking about the intermediate steps.

1. ___
2. ___
3. ___

To accomplish these tasks, two Python packages come to our help:

1. `clean-text` for normalizing text https://pypi.org/project/clean-text/
2. `docx` for reading and writing word documents https://python-docx.readthedocs.io/en/latest/index.html

We're going to get help from this tutorial when using `docx`: https://tech-cookbook.com/2019/10/21/how-to-work-with-docx-in-python/

These two packages are not included in the standard Python distribution. So we have to install them. We can install a package using this command:

`!pip install <package-name>`

Let's install these two packages: 

In [None]:
!pip install clean-text

In [None]:
!pip install python-docx

Let's play around with the `clean-text` package. We can see its basic usage in its website. 

The same way, let's familiarize ourself with the `pypandoc` package. 

Let's start by writing a function that reads lines of a document into a list

Now let's create another function that processes the strings in a list and outputs another list

And finally, let's write a function that creates a document from our list of strings and writes it into memory

OK! So far so good, now let's perform the whole process for a document and see its results! 

And now, we can extend the same code to process ALL documents! 

Voila! Our documents are now ready to be translated! 