## Strings

Strings can hold almost anything in character form, including text, symbols, and digits.

As an example let's take a look at a list (more on lists later) with 7 strings inside:

In [7]:
['a simple text',
 
 
"R3a11ySm4rTP@$swo0rD",
 
 
'navidnobani',
 
 
'1366',
 
 
'7945-1414 box',
 
"cane",
 
"""I really hate these 
        !@$&ing trains"""]

['a simple text',
 'R3a11ySm4rTP@$swo0rD',
 'navidnobani',
 '1366',
 '7945-1414 box',
 'cane',
 'I really hate these \n        !@$&ing trains']

In [8]:
"I'cant sleep"

"I'cant sleep"

In [9]:
'I\'cant sleep'

"I'cant sleep"

Roughly, we can define strings as : everything that comes between ' ', " " or """ """. (Remember that this is a simplistic definition and not the whole story!). You may ask why we need to have three different ways to create a string? that's a good question! While no matter which you use, the outcome is always the same string object (For example for python 'cane', "cane" and """cane""" are exactly the same), there are certain situations which you should use them wisely. Suppose we want to create the following string:

> **I can't go to work today**

let's try to use " " first:

In [10]:
"I can't go to work today" # works perfectly!

"I can't go to work today"

and now """ """:

In [11]:
"""I can't go to work today""" # another success! :D

"I can't go to work today"

and finally   ' '  :

In [12]:
'I can't go to work today'

SyntaxError: unterminated string literal (detected at line 1) (2881694459.py, line 1)

Wait, what is **invalid syntax** now! We get such an error when Python thinks what you have written doesn't make sense! To understand better what is a **Syntax Error** let's see an example of natural language:
- This is a correct sentence $\rightarrow $ *Marco easts sandwich* $\rightarrow $ (noun) (verb) (noun)
- This is a **Syntactically wrong** sentence $\rightarrow $ *Marco cat sandwich* $\rightarrow $ (noun) (noun) (noun)

So does it mean any (noun) (verb) (noun) gives us a correct combination? Yes and No! Yes because (noun) (verb) (noun) is a **Syntactically correct** sentence but it might still be meaningless $\rightarrow $ Marco listens sandwich ( It has a **static semantic error** )

The same thing exists in Python:
- *'Seven'11* is Syntactically wrong because Python can't  understand what do you mean by (str)(int) $\rightarrow $ **SyntaxError**

- *'seven' * 'two'* is also wrong but not Syntactically because although Python can understand what is your goal, (multiplying two strings) it can't do it $\rightarrow $ **TypeError**


Back to our exampole in the last cell, can you figure out what is the problem? (of course you can!)
There are two solutions here:
- using alternative strings as we did before (' ' or " " )
- using a character scape : In Python (and other programming languages) certain characters has a reserved role. It means not all the characters for python have the same characteristic behind the scene. In order to force Python to ignore the special role of certain characters like **\'** or **\"** we can use a backslash :

In [13]:
'I can\'t go to work today'

"I can't go to work today"

Now let's try to write a multi-line string :

In [14]:
'some
thing
long'

SyntaxError: unterminated string literal (detected at line 1) (3438523979.py, line 1)

In [15]:
"some
thing
long"

SyntaxError: unterminated string literal (detected at line 1) (1782343400.py, line 1)

In [16]:
print('just\nwrite\tsomething')

just
write	something


In [17]:
print(a)

NameError: name 'a' is not defined

In [18]:
print("""some
thing
long""")

some
thing
long


In [19]:
"""some
thing
long"""

'some\nthing\nlong'

Well' it seems working but what are those weird **\n** s in the middle? **\n** is one of the [Python literals](https://docs.python.org/2.0/ref/strings.html) that is used to indicated a new line. We don't go into details here but remember what we have said before about the characters with special meaning and how sometimes we need to escape them. As the last example let's try to make a string of the current notebook's path:

**C:\Documents\Newsletters\Summer2018.pdf**

In [20]:
print('ciao\ciao')

ciao\ciao


In [21]:
print('C:\Documents\Newsletters\Summer2018.pdf')

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 12-13: malformed \N character escape (1334344037.py, line 1)

As you can see we're getting a SyntaxError again. The reason is that Python thinks we want to use \n but we wrote \N instead. Since our goal here is not to indicate a new line (\n) but to use **\\** as part of string, we need to escape this special character. How can we do it?

By using **\\** itself! When Python sees **\\** is understands that you don't want to use special meaning of **\\** but you actually need to use backslash itself:

In [22]:
'C:\Documents\\Newsletters\Summer2018.pdf'

'C:\\Documents\\Newsletters\\Summer2018.pdf'

In [23]:
print('C:\Documents\\Newsletters\Summer2018.pdf')

C:\Documents\Newsletters\Summer2018.pdf


Before going to the next part which is about string operations, let's do  quick recap about what we have seen before. As you remember we've said that we can convert floats to integers (cast them) and vice versa and also a string to float or integer. The same thing exists in the contrary way: we can convert integers and floats to strings:

In [24]:
customer_id = 73456
print(customer_id)
print(type(customer_id))

73456
<class 'int'>


In [25]:
int('43')

43

In [26]:
customer_id_str = str(customer_id)
print(customer_id_str)
print(type(customer_id_str))

73456
<class 'str'>


### Common string operations

Having a string, you can apply on it a wide variety of string methods in order to modify it. There are dozens of operations which you can find [here](https://docs.python.org/2/library/string.html). We'll see together the mostly used ones:

#### Split

Returns a list of pieces of original string cut on the indicated string.

In [27]:
print('a', 'foo', 'bar', 'ham')

a foo bar ham


In [28]:
text = 'This is a Python course'
print(text)
text = text.split(sep=' ')
print(text)

This is a Python course
['This', 'is', 'a', 'Python', 'course']


In [29]:
text = 'I saw the guy, and suddenly he disappeared'
print(text)
text = text.split(',') # notice that I wrote ',' instead of sep=','
print(text)

I saw the guy, and suddenly he disappeared
['I saw the guy', ' and suddenly he disappeared']


#### Slicing

Cutting the string in a way to have new string which is the subset of the original string.

Actually, before explaining slicing, we need to learn a principal concept in Python : Indexing.
Suppose you have a box with 11 pens in it. If you want to take out them one by one and count them, most probably you'll go with:
- One
- Two
- ...
- Eleven

It means, you're using a 1-based index, you indicate the first item with number 1. Python on the other hand, uses a 0-based index. It means it refers to the first object as number(index) 0. So to count the pens in our example, Python goes:
- Zero
- One
- ...
- ten

Ok, now that we know how Python indexing works, we can see string indexing.
In Python Indexing/slicing is not limited to strings, instead, we can slice/index any object that can be considered as a group of objects. That being said, It should be clear now that for example the string: "cane" while considered as a string object is also seen as a container(not as a technical definition) of "c", "a", "n" and "e" with indexes 0, 1, 2 and 3. It means we can slice this string using their indexes. Let's see an example:

suppose we have this string: "NEW YORK" and we want to get the first 3 letters. The indexes are show below:

<img src="../../Images/slice.png" width="500">



In [30]:
'New York'

'New York'

In [31]:
t = 'New York'
t[4:]

'York'

In [32]:
text = 'NEW YORK'
print(text[1]) # E: second letter

print(text[0:3]) # NEW
print(text[40:80]) # YORK

print(text[:3]) # NEW
print(text[4:]) # YORK

E
NEW

NEW
YORK


Indexes can be address not only from the beginning but also from the end:

<img src="../../Images/slice_2.png" width="500"> 

In [33]:
text = 'NEW YORK'
print(text[-7]) # E: second letter

print(text[-8:-5]) # NEW
print(text[-4:]) # YORK

print(text[:-5]) # NEW


E
NEW
YORK
NEW


Slicing has other options which give you more control. We won't cover them here but in this [site](https://www.pythoncentral.io/cutting-and-slicing-strings-in-python/) you can find some interesting examples.

In [34]:
parola = 'pipho'

In [35]:
parola[-2]

'h'

#### Replace

Returns a new version of the original string in which the str_to_be_replaced instances are replaced with str_to_be_replaced_with:

In [36]:
text = 'I love my dog and my little dog'
print(text.replace('dog', 'cat', 1))

I love my cat and my little dog


In [37]:
t = 'tavolo cavolo'
t.replace('cavolo', 'gatto')

'tavolo gatto'

#### Concatenation

concatenation allows you to create new strings but concatenating existing string together , using "+":

In [38]:
text_1 = 'Hello'
text_2 = 'World!'
text_3 = text_1 + ' ' + text_2

print(text_3)

Hello World!


#### Strip

Removes the given string from the beginning and end of the original string. Note: Like split( ) function, the default value of strip( ) function is space so if you run my_text.strip( ), it cleans my_text string from starting and ending spaces.

In [39]:
print()




In [40]:
text_1 = ' Pirelli Spa '
print(text_1.strip())

text_2 = 'Continental -'
print(text_2.strip('-').strip())

text_3 = '*Python***'
print(text_3.lstrip('*')) # There is also "rsplit"

Pirelli Spa
Continental
Python***


Although not string operations, **print** and **len** function are Python built-in functions (more on this later) which can be used with different data types like strings, numbers, lists,...

#### Printing a string

**print** function prints(!) the input to the stdout(standard output). Note that not all the Python objects can be printed out.

In [41]:
print('something')
print('-'*40)

text = 'just a short text'
print(text)
print('-'*40)

text_1 = 'My name is'
text_2 = 'Navid'
print(text_1, text_2)
print('-'*40)

text_3 = 'University'
text_4 = 'of'
text_5 = 'Bicocca'
print(text_3, text_4, text_5)
print('-'*40)

text_6 = 'folder1'
text_7 = 'folder2'
text_8 = 'folder3'
text_9 = 'file'
print(text_6, text_7, text_8, text_9, sep='/')

something
----------------------------------------
just a short text
----------------------------------------
My name is Navid
----------------------------------------
University of Bicocca
----------------------------------------
folder1/folder2/folder3/file


#### String  length

**len** returns the number of characters in the given string:

In [42]:
print(len('bicocca'))
print(len('AC DC!'))
print(len(' cane 1'))

7
6
7


<img src="../Images/baby.svg"   width="30" align="left">               

**YOUR TURN :** Replace the spaces in the **target** with the 10th character of the **source**:

In [43]:
source = '\'\'\'ChUpAcHuPs"/"/"/"'
target = 'pP pP pP sS sS sS cC cC cC'

In [44]:
x = source[9]

In [45]:
target.replace(' ', x)

'pPHpPHpPHsSHsSHsSHcCHcCHcC'

In [46]:
(\   .-.   .-.   .-.   .-.   .-.   .-.   .-.   .-.   /_")
 \\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//^\\_//
  `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`   `"`

SyntaxError: unexpected character after line continuation character (204823071.py, line 1)

#### Other operations

There are some operations that perform a transformation on strings:

In [49]:
text = 'MilaN'
print(text.lower())
print(text.upper())
print(text.title())

milan
MILAN
Milan


Being a strongly typed language, Python doesn't let you to use operations on objects unrelated to them. For example "'Cat' + 3" gives you a "TypeError" since addition operation is for numerical objects not strings:

In [50]:
'cat' - 3

TypeError: unsupported operand type(s) for -: 'str' and 'int'

But at the same time, there are some exceptions. For example "'cat' * 3" works since Python assume that you want to repeat the string 'cat' , 3 times:

In [51]:
'cat' * 3

'catcatcat'