<a href="https://colab.research.google.com/github/albertomanfreda/intensive_school_ml/blob/master/Lesson2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Strings

Strings in Python are sequences of characters. They can be easily created by just putting some text inside double or single quotes.

In [1]:
a_string = 'This is a string'
another_string = "This is a string, too"

## String formatting

There are several different ways to format strings in Python:

* The ugly way, just adding pieces with '+'
* "Old-school" with the *%* operator
* With the new *format* syntax introduced in Python3
* With the even newer *f-strings* (Python 3.6+), which we won't cover here.


In [31]:
# The ugly way: avoid this if you can
name = 'Bob'
age = 15
print('My name is ' + name + ', I am ' + str(age) + ' years old.')

My name is Bob, I am 15 years old.


### Old style string formatting

In [6]:
name = 'Bob'
greetings = 'Hi, %s!' % name # %s is a placeholder for a string
print(greetings)

age = 15
print('I am %d years old' % age) # %d is the placeholder for an integer

pi = 3.1415926535
print('%f' % pi) # %f is for floating point numbers
print('%.9f' % pi) # You can control the number of decimal digits like this

# You can use the % operator to substitute more than one values in a string
print('%s is %d years old' % (name, age)) # Note the parenthesis!

Hi, Bob!
I am 15 years old
3.141593
3.141592654
Bob is 15 years old


### New string formatting

*  The placeholders are indicated by a *{}*
*  Format specifiers are declared inside the placeholder, after a colon
*  The *%* operator at the end of the string is replaced by *format* 

In [12]:
print('{} is {:d} years old'.format(name, age))
print('{:.6f}'.format(pi))

Bob is 15 years old
3.141593


The new syntax may seems slightly less intuitive, but it is also more powerful:


In [15]:
# We can give names to the placeholders, which helps readibility
print('{son} is the son of {mother}'.format(son='Bob', mother='Alice'))

Bob is the son of Alice


In [14]:
# We can also substitute the placeholders later
unformatted_string = '{} is the son of {}' # Note how \ is used to escape the single quote
mother = 'Alice'
son = 'Bob'
print(unformatted_string.format(son, mother))

Bob is the son of Alice


Besides the '%' operator Python identifies another special character: the backslash *\*.

This is used essentially for two reasons:

1.  Escape other special characters, so that they lose their special meaning an can be actually represented in the string:

  * \\'  (escaped single quote)

  * \\"  (escaped double quote)

  * \\\  (escaped backslash)

2.  Combine with other characters to create special symbols:
  
  * \n 	(New Line)
  * \r 	(Carriage Return)
  * \t 	(Tab)
  * others....

In [28]:
print('Bob\'s mother is Alice')

Bob's mother is Alice


In [29]:
print('Let\'s start a newline\n\tand give it a little shift!')

Let's start a newline
	and give it a little shift!


### Further readings

Learning all about the string formatting syntax is way beyond the scope of this course. You can find a useful list of tricks here:

https://pyformat.info/#string_pad_align

Under the hoods, one of the main changes in the passage bewteen Python 2 and Python 3 is the addition of Unicode support. As a data scientist, it is unlikely that you will need to care about that, but if you are really interested you may take a look at:

https://docs.python.org/3/howto/unicode.htm

or read chapter 4 of the great book 'Fluent Python' by Luciano Ramalho.

# Data Structures

## Lists

A **list** is a powerful data strcuture that can contain an arbitrary number of elements of any type (even different between them). The syntax to create a list is the follwings:



```
list_name = [arg1, arg2, arg3, ...]
```



In [None]:
my_list = [1, 2., 'a string']

Elements in a list can be positionally accessed with []

In [39]:
# Note: the first element is numbered with zero, not one!
print(my_list[0], my_list[1], my_list[2])
# The index is circular: -1 is the last element, -2 the second-to-last and so on
print(my_list[-1], my_list[-2], my_list[-3])

1 2.0 a string
a string 2.0 1


In Python, there is a powerful syntax to access subsamples of a list: **slices**. Note: slices are a fundamental tool for arrays and tensors manipulation in the numpy library, which is the base of the Python scientific ecosystem.

Lists in Python are **mutable**. They can be changed by, e.g. adding elements at the end, or changing the content of one or more of the elements.

In [35]:
my_list.append('another_string')
print(my_list)

[1, 2.0, 'a string', 'another_string']
