# Python the basics: datatypes

> *DS Data manipulation, analysis and visualisation in Python*  
> *September, 2020*



## Importing packages

Importing packages is always the first thing you do in python, since it offers the functionalities to work with.

Different options are available:

* <span style="color:green">import <i>package-name</i></span>  <p> importing all functionalities as such
* <span style="color:green">from <i>package-name</i> import <i>specific function</i></span>  <p> importing a specific function or subset of the package
* <span style="color:green">from <i>package-name</i> import *  </span>   <p> importing all definitions and actions of the package (sometimes better than option 1)
* <span style="color:green">import <i>package-name</i> as <i>short-package-name</i></span>    <p> Very good way to keep a good insight in where you use what package

import all functionalities as such

In [None]:
# Two general packages
import os
import sys

In [None]:
pip install tweepy



In [None]:
import tweepy

In [None]:
tweepy.

## Basic python datatypes

### Numerical types

Python supports the following numerical, scalar types:
* integer
* floats
* complex
* boolean

In [None]:
an_integer = 3

In [None]:
%whos

Variable     Type      Data/Info
--------------------------------
an_integer   int       3
os           module    <module 'os' from '/usr/lib/python3.7/os.py'>
sys          module    <module 'sys' (built-in)>
tweepy       module    <module 'tweepy' from '/u<...>ages/tweepy/__init__.py'>


In [None]:
type(an_integer)

int

In [None]:
an_integer

3

In [None]:
# type casting: converting the integer to a float type
float(an_integer)

3.0

In [None]:
type(an_integer)

int

In [None]:
an_integer = float(an_integer)
an_integer

3.0

In [None]:
a_float = 0.2
type(a_float)

float

In [None]:
a_complex = 1.5 + 0.5j
# get the real or imaginary part of the complex number by using the functions
# real and imag on the variable
print(type(a_complex), a_complex.real, a_complex.imag)

<class 'complex'> 1.5 0.5


In [None]:
a_boolean = (3 > 4)
a_boolean

False

In [None]:
(3 > 40) & (20>500)

False

In [None]:
(3 < 40) & (20<500)

True

In [None]:
(3 < 40) & (20>500)

False

In [None]:
(3 < 40) | (20>500)

True

In [None]:
(3 > 40) | (20>500)

False

In [None]:
(3 < 40) | (20<500)

True

In [None]:
var_1 = "IAS"
var_1

'IAS'

In [None]:
var_2 = 'IAS2'
var_2

'IAS2'

In [None]:
type(var_2)

str

In [None]:
var_3 = "34"
print(var_3)
print(type(var_3))

34
<class 'str'>


In [None]:
float(var_3)

34.0

In [None]:
int(var_3)

34

In [None]:
var_4 = "Hello"
var_4

'Hello'

In [None]:
int(var_4)

ValueError: ignored

In [None]:
aa = 32
str(aa)

'32'

In [None]:
a = 10
a = 18
a = 80


In [None]:
%whos

Variable     Type      Data/Info
--------------------------------
a            int       80
a_boolean    bool      False
a_float      float     0.2
aa           int       32
an_integer   float     3.0
os           module    <module 'os' from '/usr/lib/python3.7/os.py'>
sys          module    <module 'sys' (built-in)>
tweepy       module    <module 'tweepy' from '/u<...>ages/tweepy/__init__.py'>
var_1        str       IAS
var_2        str       IAS2
var_3        str       34
var_4        str       Hello


In [None]:
a = str(a)
a_ = float(a)

In [None]:
%whos

Variable     Type      Data/Info
--------------------------------
a            str       80
a_           float     80.0
a_boolean    bool      False
a_float      float     0.2
aa           int       32
an_integer   float     3.0
os           module    <module 'os' from '/usr/lib/python3.7/os.py'>
sys          module    <module 'sys' (built-in)>
tweepy       module    <module 'tweepy' from '/u<...>ages/tweepy/__init__.py'>
var_1        str       IAS
var_2        str       IAS2
var_3        str       34
var_4        str       Hello


In [None]:
azerty = 5.7
int(azerty)

5

In [1]:
round(azerty, 2)

NameError: name 'azerty' is not defined

A Python shell can therefore replace your pocket calculator, with the basic arithmetic operations addition, substraction, division ... are natively implemented
+, -, *, /, % (modulo) natively implemented

 *operation*| *python implementation* 
----------:| --------------------- 
 addition  | `+`
 substraction | `-`
 multiplication | `*`
 division | `/`
 modulo | `%`
 exponentiation | `**`

In [None]:
print (7 * 3.)
print (2**10)
print (8 % 3) ### reste: 2

21.0
1024
2


**Attention !**

In [None]:
print(3/2)
print(3/2.)
print(3.//2.)  #integer division

1.5
1.5
1.0


In [None]:
import numpy as np
np.pi

3.141592653589793

### Containers

#### Lists

A list is an ordered collection of objects, that may have different types. The list container supports slicing, appending, sorting ...

Indexing starts at 0 (as in C, C++ or Java), not at 1 (as in Fortran or Matlab)!


In [None]:
une_liste = ["azerty", 12, 12., False, 6778, "djk"]

In [None]:
une_liste[0] ### Element 1 de la liste

'azerty'

In [None]:
une_liste[5]

'djk'

In [None]:
une_liste

['azerty', 12, 12.0, False, 6778, 'djk']

In [None]:
une_liste[-1] #### dernier élement en partant de la fin

'djk'

In [None]:
une_liste[-3]

False

In [None]:
a_list = [2.,'aa', 0.2]
a_list

[2.0, 'aa', 0.2]

In [None]:
# accessing individual object in the list
a_list[1]

'aa'

In [None]:
# negative indices are used to count from the back
a_list[-1]

0.2

**Slicing**: obtaining sublists of regularly-spaced elements

In [None]:
another_list = ['first', 'second', 'third', 'fourth', 'fifth']

In [None]:
print(another_list[3:]) #### Tout les élement de l'index 3 (fourth) à la fin

['fourth', 'fifth']


In [None]:
print(another_list[:2]) #### On récupére tout les élement de 0 à l'index 2 (index 2 exclu)

['first', 'second']


In [None]:
another_list

['first', 'second', 'third', 'fourth', 'fifth']

In [None]:
print(another_list[::2]) #### On récupéère les élements 2 à 2 (pas de 2)

['first', 'third', 'fifth']


In [None]:
print(another_list[::3])

['first', 'fourth']


* Note that L[start:stop] contains the elements with indices i such as start<= i < stop 
* (i ranging from start to stop-1). Therefore, L[start:stop] has (stop-start) elements.
* Slicing syntax: L[start:stop:stride]
* all slicing parameters are optional

Lists are *mutable* objects and can be modified

In [None]:
another_list[3] = 'newFourth'
print(another_list)


['first', 'second', 'third', 'newFourth', 'fifth']


In [None]:
another_list[1:3] = ['newSecond', 'newThird']
print(another_list) ### Changer les valeurs de l'objet liste (ici index 1 et index 2) - Index 3 non pris en compte

['first', 'newSecond', 'newThird', 'newFourth', 'fifth']


Warning, with views equal to each other, they point to the same point in memory. Changing one of them is also changing the other!!

In [None]:
a = ['a',  'b']
b = a
b[0] = 1
print(a)

[1, 'b']


**List methods**:

You can always list the available methods in the namespace by using the dir()-command:

In [None]:
#dir(list)

In [1]:
a_third_list = ['red', 'blue', 'green', 'black', 'white']

In [2]:
# Appending
a_third_list.append('pink') #### Ajouter l'élement 'pink à la liste (fin de la liste)
a_third_list

['red', 'blue', 'green', 'black', 'white', 'pink']

In [3]:
# Removes and returns the last element
a_third_list.pop()
a_third_list

['red', 'blue', 'green', 'black', 'white']

In [None]:
a_third_list.remove()
a_third_list

TypeError: ignored

In [None]:
# Extends the list in-place
a_third_list.extend(['pink', 'purple'])
a_third_list

['green',
 'black',
 'white',
 'pink',
 'purple',
 'pink',
 'purple',
 'pink',
 'purple',
 'pink',
 'purple',
 'pink',
 'purple']

In [None]:
len(a_third_list)

13

In [None]:
a_fourth_list = ["yellow", "gray", "orange"]
a_third_list.extend(a_fourth_list)
a_third_list

['green',
 'black',
 'white',
 'pink',
 'purple',
 'pink',
 'purple',
 'pink',
 'purple',
 'pink',
 'purple',
 'pink',
 'purple',
 'yellow',
 'gray',
 'orange']

In [4]:
# Reverse the list
a_third_list.reverse()
a_third_list

['white', 'black', 'green', 'blue', 'red']

In [None]:
# Remove the first occurence of an element
a_third_list.remove('white')
a_third_list

ValueError: ignored

In [5]:
# Sort list
a_third_list.sort()
a_third_list

['black', 'blue', 'green', 'red', 'white']

------------

<div class="alert alert-success">
    <b>EXERCISE</b>: What happens if you put two question marks behind the command?
</div>

------------

In [None]:
a_third_list = ['red', 'blue', 'green', 'black', 'white']

In [None]:
# remove the last two elements
a_third_list = a_third_list[:-2]
a_third_list

['red', 'blue', 'green']

<div class="alert alert-success">
    <b>EXERCISE</b>: Try to code your own  *reverse* command using the appropriate slicing command:
</div>

In [None]:
a_third_list = ['red', 'blue', 'green', 'black', 'white']


In [6]:
a_third_list.reverse()
a_third_list

['white', 'red', 'green', 'blue', 'black']

In [7]:
a_third_list[::-1]

['black', 'blue', 'green', 'red', 'white']

------------

Concatenating lists is just the same as summing both lists:

In [None]:
a_list = ['pink', 'orange']
a_concatenated_list = a_third_list + a_list
a_concatenated_list

<div class="alert alert alert-danger">
    <b>Note</b>: Why is the following not working?
</div>

In [None]:
reverted = a_third_list.reverse()
## comment out the next lines to test the error:
#a_concatenated_list = a_third_list + reverted
#a_concatenated_list

The list itself is reversed and no output is returned, so reverted is None, which can not be added to a list

------------

In [None]:
# Repeating lists
a_repeated_list = a_concatenated_list*10
print(a_repeated_list)

**List comprehensions**

List comprehensions are a very powerful functionality. It creates an in-list for-loop option, looping through all the elements of a list and doing an action on it, in a single, readable line.

In [9]:
number_list = [1, 2, 3, 4]

#### Type 1: Iterer par élement

for ele in number_list: #### POur chaque element de l'objet
  print(ele) #### J'affiche l'element

1
2
3
4


In [15]:
number_list = [1, 2, 3, 4]

In [16]:
#### Type 2: range et d'index

a = []
for i in range(0, len(number_list)):
  #print(i) ### Afficher l'index de chaque élemeny dans l'objet number_list
  a.append(i) #### Ajoute à la lsite a l'objet i

In [12]:
a

[0, 1, 2, 3]

In [13]:
number_list

[1, 2, 3, 4]

In [17]:
### liste de comprehesion

[herve**2 for herve in number_list] ##### [sortie iteration] ie j'affiche i**2 pour chaque élement de l'objet

[1, 4, 9, 16]

In [18]:
number_list = [1, 2, 3, 4]
li = []

for ele in number_list: #### Pour chaque élement de l'objet
  li.append(ele**2)  #### J'affiche l'élement au carré que je place dans la liste li

In [19]:
li

[1, 4, 9, 16]

In [20]:
#### Temps de computation

%%timeit

[herve**2 for herve in number_list] 

The slowest run took 4.25 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 1.18 µs per loop


In [21]:
%%timeit

number_list = [1, 2, 3, 4]
li = []

for ele in number_list: #### Pour chaque élement de l'objet
  li.append(ele**2)  #### J'affiche l'élement au carré que je place dans la liste li

The slowest run took 9.51 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 1.3 µs per loop


and with conditional options:

In [22]:
[i**2 for i in number_list if i>1] #### Condition i > 1

[4, 9, 16]

In [None]:
[i**2 for i in number_list if i>1]

In [None]:
# Let's try multiplying with two on a list of strings:
print([i*2 for i in a_repeated_list])

Cool, this works! let's check more about strings:

#### Strings

Different string syntaxes (simple, double or triple quotes)

In [23]:
s = 'Never gonna give you up'
print(s)
s = "never gonna let you down"
print(s)
s = '''Never gonna run around 
    and desert you'''         
print(s)
s = """Never gonna make you cry, 
    never gonna say goodbye"""
print(s)

Never gonna give you up
never gonna let you down
Never gonna run around 
    and desert you
Never gonna make you cry, 
    never gonna say goodbye


In [4]:
## pay attention when using apostrophes! - test out the nxt two lines one at a time
#print('Hi, what's up?')
print("Hi, what's up?")

Hi, what's up?


The newline character is **\n**, and the tab character is **\t**.

In [32]:
print('''Never gonna tell a lie and hurt you. Never gonna give you up,\tnever gonna\tlet you down
Never \ngonna\n run around and\t desert\t you''')

Never gonna tell a lie and hurt you. Never gonna give you up,	never gonna	let you down
Never 
gonna
 run around and	 desert	 you


In [33]:
a = 1
if (a == 1):
  print("Hello\tWorld")

Hello	World


Strings are collections like lists. Hence they can be indexed and sliced, using the same syntax and rules.

In [7]:
a_string = "hello"
print(a_string[0])

h


In [8]:
print(a_string[1:5])


ello


In [38]:
print(a_string[-4:-1]) #### Récupérer les élement de -4 à -1 (-1 exclue )

ell


In [40]:
print(a_string[-4:])

ello


In [15]:
print(a_string[-4:-1:1])

ell


Accents and special characters can also be handled in Unicode strings (see http://docs.python.org/tutorial/introduction.html#unicode-strings).

In [16]:
print(u'Hello\u0020World !')

Hello World !


A string is an immutable object and it is not possible to modify its contents. One may however create new strings from the original one.

In [None]:
#a_string[3] = 'q'   # uncomment this cell

We won't introduce all methods on strings, but let's check the namespace and apply a few of them:

In [None]:
#dir(str) # uncomment this cell

In [46]:
another_string = "Strawberry-raspBerry pAstry package party"
another_string.lower() #### Minuscule

'strawberry-raspberry pastry package party'

In [47]:
another_string = "Strawberry-raspBerry pAstry package party"
another_string.upper()

'STRAWBERRY-RASPBERRY PASTRY PACKAGE PARTY'

In [48]:
another_string = "Strawberry-raspBerry pAstry package party"
another_string.lower().replace("r", "l") #### repmplacer l'ensemble des r par des l

'stlawbelly-laspbelly pastly package palty'

In [17]:
str.replace?

In [51]:
another_string = "Strawberry-raspBerry pAstry package party"
another_string.lower().replace("r", "l", 4)  #### On remplce les 4 premier r de note str

'stlawbelly-laspberry pastry package party'

In [52]:
another_string.lower().replace("r", "l", 4).upper()

'STLAWBELLY-LASPBERRY PASTRY PACKAGE PARTY'

String formatting to make the output as wanted can be done as follows:

In [18]:
print('An integer: %i; a float: %f; another string: %s' % (1, 0.1, 'string'))

An integer: 1; a float: 0.100000; another string: string


The [`format` string print](https://pyformat.info/) options in python 3 are able to interpret the conversions itself:

In [53]:
print('An integer: {}; a float: {}; another string: {}'.format(1, 0.1, 'string'))

An integer: 1; a float: 0.1; another string: string


In [54]:
print('An integer: {2}; a float: {0}; another string: {1}'.format(1, 0.1, 'string'))

An integer: string; a float: 1; another string: 0.1


In [None]:
n_dataset_number = 20
sFilename = 'processing_of_dataset_%d.txt' % n_dataset_number
print(sFilename)

<div class="alert alert alert-success">
    <b>Exercise</b>: With the `dir(list)` command, all the methods of the list type are printed. However, where not interested in the hidden methods. Use a list comprehension to only print the non-hidden methods (methods with no starting or trailing '_'):
</div>

In [19]:
dir(list)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [60]:
dir(list)[0] ### Accede aux éléments de l'objet

'__add__'

In [62]:
dir(list)[2]

'__contains__'

In [61]:
dir(list)[-1]

'sort'

In [63]:
#### deux premier caractère du premier élement

dir(list)[0][0:2]

'__'

In [64]:
dir(list)[-1][0:2]

'so'

In [68]:
a = []
for ele in (dir(list)): ### POur chaque élement de l'objet
  if (ele[0:2] != "__"): #### Si les deux premiers caractère sont != de "__"
    print(ele)
    a.append(ele) #### Stocke les élements dans l'objet a

append
clear
copy
count
extend
index
insert
pop
remove
reverse
sort


In [69]:
a

['append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [76]:
len(dir(list))

46

In [73]:
b = []
for i in range(0,len(dir(list))): ### POur chaque index de l'objet
  if (dir(list)[i][0:2] != "__"): #### Si les deux premiers caractère sont != de "__"
    print(dir(list)[i])
    b.append(dir(list)[i]) #### Stocke les élements dans l'objet b

append
clear
copy
count
extend
index
insert
pop
remove
reverse
sort


In [74]:
b

['append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [75]:
#### Méthode 3: Liste de compréhension

[ ele for ele in (dir(list)) if (ele[0:2] != "__")]

['append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [None]:
##### Correction exercice 1



<div class="alert alert alert-success">
    <b>Exercise</b>: Given the previous sentence `the quick brown fox jumps over the lazy dog`, split the sentence and put all the word-lengths in a list. 
</div>

In [20]:
sentence = "the quick brown fox jumps over the lazy dog"

In [21]:
sentence_2 = "the_quick_brown_fox_jumps_over_the_lazy_dog"

In [22]:
sentence_2.split()

['the_quick_brown_fox_jumps_over_the_lazy_dog']

In [25]:
sentence_2.split("_") #### Séparateur _

['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

In [23]:
str.split?

In [82]:
sentence.split()

['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

In [86]:
c = []
for ele in sentence.split(): #### POur chaque élement de la liste
  print(len(ele))
  c.append(len(ele)) #### J'ajoute la longuer de chaque élement de la liste dans c

3
5
5
3
5
4
3
4
3


In [87]:
c

[3, 5, 5, 3, 5, 4, 3, 4, 3]

In [88]:
liste_1 = [len(ele) for ele in sentence.split()]
liste_1

[3, 5, 5, 3, 5, 4, 3, 4, 3]

------------

#### Dictionaries

A dictionary is basically an efficient table that **maps keys to values**. It is an **unordered** container

It can be used to conveniently store and retrieve values associated with a name

In [97]:
# Always key : value combinations, datatypes can be mixed
hourly_wage = {'Jos':10, 'Frida': "hello", 'Gaspard': '13', 23 : 3, "herve" : [1,2,3]} #### Objet clé valeur
hourly_wage

{23: 3, 'Frida': 'hello', 'Gaspard': '13', 'Jos': 10, 'herve': [1, 2, 3]}

In [94]:
hourly_wage['Jos'] #### La valeur de la clé "Jos"

10

In [95]:
hourly_wage["Frida"]

'hello'

In [96]:
hourly_wage[23] #### Dictionnaire[clé] = valeur

3

In [98]:
hourly_wage["herve"]

[1, 2, 3]

 =Adding an extra element:

In [99]:
hourly_wage['Antoinette'] = 15 ### NOuvelle valeur
hourly_wage

{23: 3,
 'Antoinette': 15,
 'Frida': 'hello',
 'Gaspard': '13',
 'Jos': 10,
 'herve': [1, 2, 3]}

You can get the keys and values separately:

In [100]:
hourly_wage.keys() ##### Ensemble des clés

dict_keys(['Jos', 'Frida', 'Gaspard', 23, 'herve', 'Antoinette'])

In [101]:
hourly_wage.values() #### Ensemble des valeurs

dict_values([10, 'hello', '13', 3, [1, 2, 3], 15])

In [102]:
hourly_wage.items() # all combinations in a list

dict_items([('Jos', 10), ('Frida', 'hello'), ('Gaspard', '13'), (23, 3), ('herve', [1, 2, 3]), ('Antoinette', 15)])

In [103]:
# ignore this loop for now, this will be explained later
for key, value in hourly_wage.items():
    print(key,' earns ', value, '€/hour')

Jos  earns  10 €/hour
Frida  earns  hello €/hour
Gaspard  earns  13 €/hour
23  earns  3 €/hour
herve  earns  [1, 2, 3] €/hour
Antoinette  earns  15 €/hour


<div class="alert alert alert-success">
    <b>Exercise</b> Put all keys of the hourly_wage dictionary in a list as strings.  If their not yet a string, convert them:
</div>

In [None]:
hourly_wage = {'Jos':10, 'Frida': 9, 'Gaspard': '13', 23 : 3}

In [104]:
str("hello")

'hello'

In [105]:
str(56)

'56'

In [106]:
hourly_wage.keys() #### Ensemlbe des clés du dictionnaire

dict_keys(['Jos', 'Frida', 'Gaspard', 23, 'herve', 'Antoinette'])

In [110]:
d = []
for ele in hourly_wage.keys():
  print(ele)
  d.append(str(ele)) #### Ajout de l'élément dans d

Jos
Frida
Gaspard
23
herve
Antoinette


In [111]:
d

['Jos', 'Frida', 'Gaspard', '23', 'herve', 'Antoinette']

In [112]:
[str(ele) for ele in hourly_wage.keys()] 

['Jos', 'Frida', 'Gaspard', '23', 'herve', 'Antoinette']

----------------------------

#### Tuples

Tuples are basically immutable lists. The elements of a tuple are written between parentheses, or just separated by commas

In [None]:
a_tuple = (2, 3, 'aa', [1, 2])
a_tuple

In [None]:
a_second_tuple = 2, 3, 'aa', [1,2]
a_second_tuple

the key concept here is mutable vs. immutable
* mutable objects can be changed in place
* immutable objects cannot be modified once created