<a href="https://colab.research.google.com/github/albertomanfreda/intensive_school_ml/blob/master/Lesson2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Strings

Strings in Python are sequences of characters. They can be easily created by just putting some text inside double or single quotes.

In [1]:
a_string = 'This is a string'
another_string = "This is a string, too"

## String formatting

There are three main ways to format strings in Python:

* "Old-school" with the *%* operator
* With the new *format* syntax introduced in Python3
* With the even newer *f-strings* (Python 3.6+), which we won't cover here.


### Old style string formatting

In [6]:
name = 'Bob'
greetings = 'Hi, %s!' % name # %s is a placeholder for a string
print(greetings)

age = 15
print('I am %d years old' % age) # %d is the placeholder for an integer

pi = 3.1415926535
print('%f' % pi) # %f is for floating point numbers
print('%.9f' % pi) # You can control the number of decimal digits like this

# You can use the % operator to substitute more than one values in a string
print('%s is %d years old' % (name, age)) # Note the parenthesis!

Hi, Bob!
I am 15 years old
3.141593
3.141592654
Bob is 15 years old


### New string formatting

*  The placeholders are indicated by a *{}*
*  Format specifiers are declared inside the placeholder, after a colon
*  The *%* operator at the end of the string is replaced by *format* 

In [7]:
print('{} is {:d} years old'.format(name, age))
print('{:.6f}'.format(pi))

Bob is 15 years old
3.141593


The new syntax may seems slightly less intuitive, but it is also more powerful:


In [9]:
# We can substitute the placeholder later
unformatted_string = '{} is the son of {}' # Note how \ is used to escape the single quote
mother = 'Alice'
son = 'Bob'
unformatted_string.format(son, mother)

'Bob is the son of Alice'

In [8]:
# We can give names to the placeholder, which helps readibility
'{son} is the son of {mother}'.format(son='Bob', mother='Alice') 

'Bob is the son of Alice'

Besides *%* Python identifies another special character: the backslash *\*.

This is used essentially for two reasons:

1.  Escape other special characters, so that they lose their special meaning an can be actually represented in the string:

  \\'  (escaped single quote)

  \\"  (escaped double quote)

  \\\  (escaped backslash)

2.  Combine with other characters to create special symbols:
  
  \n 	New Line 	
  \r 	Carriage Return 	
  \t 	Tab 	
  \b 	Backspace 	
  \f 	Form Feed 	
  \ooo 	Octal value 	
  \xhh 	Hex value

Learning all about the string formatting syntax is beyond the scope of this course. You can find a useful list of tricks here:

https://pyformat.info/#string_pad_align

Under the hoods, one of the main changes in the passage bewteen Python 2 and Python 3 is the addition of Unicode support. As a data scientist, it is unlikely that you will need to care about that, but if you are really interested you may take a look at:

https://docs.python.org/3/howto/unicode.htm

or read chapter 4 of the great book 'Fluent Python' by Luciano Ramalho.

# Data Structures

## Lists

A **list** is a powerful data strcuture that can contain an arbitrary number of elements of any type (even different between them). The syntax to create a list is one of the follwings:

In [None]:
example_list = [1, 2., 'a string']