#**Programming in python**

---



##Variables

![variables](https://mw.home.amu.edu.pl/zajecia/PPR2017/var.gif "variables")


##Variable types
Some textual variable like <font color='red'>"miserables.txt"</font> will appear in red with quotes, while others will appear in green, like <font color='green'>40</font>, without quotes. That's because they encapsulate different kinds of data.

| Data Type       | Explanation          | Example  |
| ------------- |:-------------:| -----:|
| String     | Text | ```"lemonade"``` |
| Integer     | Whole Numbers      |   <font color='green'>```40```</font> |
| Float | Decimal Numbers      |   <font color='green'>```40.2```</font> |
| Boolean | True/False     |   <font color='green'>**```False```**</font> |

We can check easily.

In [None]:
type("lemonade")

In [None]:
type(40)

##Indentation in Python

<img src='https://drive.google.com/uc?export=view&id=1_7lZJ4qI5K9Ha0B_nRjt6qw4klt7opYg' width="1000">

##Functions
We are going to use textual objects, and try and do several things with them, such as lemmatizing and postagging.
To do so, we'll use functions, to act repeatedly on those objects.

In [None]:
def split_into_words(any_chunk_of_text):
    words = re.split("\W+", any_chunk_of_text.lower())
    return words

In [None]:
split_into_words("hello I'm a test")

<img src='https://drive.google.com/uc?export=view&id=1IN-omiNV2tUXftyqaTBNO8aF-dMQPHiF' width="1000">

<img src='https://drive.google.com/uc?export=view&id=1hKB-GHN95o0E0CLDrBl75CQ2UlTObsJz' width="1000">

##Loops

You may not want to re-write everything each time you perform an action on your data (eg. lemmatize the entire sentence, you may not want to call lemmatize() on each word). So, we'll need loops, that can repeat these actions with conditions. Eg, if you've got a list of words to lemmatize, you'll say "ok, as long as you have words in your sentence, lemmatize".

<img src='https://drive.google.com/uc?export=view&id=1XKcA0ffSy3i6A0S5jWI7LPz7-PbZ1T88' width="1000">

<img src='https://drive.google.com/uc?export=view&id=1yf_wPus-G__l4rjeY5I4vzc3rtdU31JC' width="1000">

##Conditions

That's where you say "do such an action if you are in this state".
<br>Basic structure is as follows :

<br>**`if`** `some_first_condition :`
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`do_something()`
<br>**`elif`** `some_second_condtion :`
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`do_something_else()`
<br>**`else`** `:`
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`do_a_third_thing()`

<img src='https://drive.google.com/uc?export=view&id=1VYhnaiNah0xsn1We5XR1LeoXnj2SKOJh' width="1000">

##Libraries
A lot of functions have already been written by others and can be imported directly in your python code. You can install them easily in your virtual environment doing so :
<br>`pip install yourLib`
<br>Once it is installed, you can just call your lib using `import` (you can also be more specific and not import the whole lib using `from`for a specific library).

In [None]:
import re
from collections import Counter

In the following cell, we'll import a .txt document to use it for basic operations, using the gdown module, as the text is stored on a google drive.

In [None]:
!gdown --id 1GEgd5cQoJkTm5PRWfixxOKHe3uOlxFqo

In [None]:
filepath_of_text = "/content/miserables.txt"
number_of_desired_words = 40

This is a basic function to call a file from a specific directory, and read it to encapsulate it in a variable (here `full_text`).

In [None]:
full_text = open(filepath_of_text, encoding="utf-8").read()

Now, let's use the previous function we created (`split_into_words`) to basically split the data within this variable.

In [None]:
all_the_words = split_into_words(full_text)

You'll get a list of words. You can check part of the resulting list by indices.

In [None]:
all_the_words[50:60]

#Some useful tips for strings
These are the basic functions you can use to manipulate string variables.

In [None]:
first_line = full_text[88:155]
print(first_line)

| **String Method** | **Explanation**                                                                                   |
|:-------------:|:---------------------------------------------------------------------------------------------------:|
| `string.lower()`         | makes the string lowercase                                                                                |
| `string.upper()`         | makes the string uppercase  
| `string.title()`         | makes the string titlecase
| `string.strip()`         | removes lead and trailing white spaces     |
| `string.replace('old string', 'new string')`      | replaces `old string` with `new string`          |
| `string.split('delim')`          | returns a list of substrings separated by the given delimiter |
| `string.join(list)`         | opposite of split(), joins the elements in the given list together using the string                                                                        |
| `string.startswith('some string')`       | tests whether string begins with `some string` |                                                       |
| `string.endswith('some string')`       |  tests whether string ends with `some string`   |
| `string.isspace()`       |  tests whether string is a space |
| `string.replace('old string', 'new string')`      | replaces `old string` with `new string`          |

                                                            

So :

In [None]:
print(first_line.replace("Ã©tait", "est"))
print(first_line.lower())
print(first_line.upper())
print(first_line.title())

You can also split on specific strings, such as `\s`

| **String Method** | **Explanation**                                                                                   |
|:-------------:|:---------------------------------------------------------------------------------------------------:|
| `string.split('delim')`          | returns a list of substrings separated by the given delimiter |                                                       

In [None]:
first_line.split(" ")

##Import data from other modules
Sometimes, some libs provide useful lists of strings that you can reload and use. One of the most well known libs in this regard is `nltk` (for stopwords for example).

In [None]:
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
french_stopwords = set(stopwords.words('french'))

Here we're going to use nltk French stopwords to filter our previous variable `all_the_words`.

In [None]:
meaningful_words = [word for word in all_the_words if word not in french_stopwords]

In [None]:
meaningful_words[50:60]

You can also count the words pretty easily and index them according to their frequency.

In [None]:
meaningful_words_tally = Counter(meaningful_words)

You may not want to see all the words, and ask for a specific number of words, using the `number_of_desired_words` variable.

In [None]:
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

In [None]:
most_frequent_meaningful_words