# Statement

In [None]:
'Chaine de caractères'

They are defined by surrounding them with **apostrophes** or **quotes** : ***'string'*** or ***"string "***, indifferently. 

Everyone does as he wants, but the usual practice in Python is to use apostrophes (**'**) because they are simpler and more pleasant to read. 

These characters are called *quote* and *double quote*.

In [3]:
'chaine' == "chaine"

True

A text can be declared on **multiple lines** by using the same operator (triple apostrophe) as comments on several lines.

In this notation it is not necessary to specify the **line break** character.

In [4]:
ma_chaine = '''Text in
multiple lines'''

print(ma_chaine)

Text in
multiple lines


We can get the **number of characters** of a string by using the native function **len()** (like *length*)

In [5]:
len('Strings')

7

# Opérations

## Opérateurs usuels

We can use some usual operators with strings

**Addition**

In [7]:
'Data' + ' ' + 'sciences' 

'Data sciences'

**Multiplication**

In [10]:
3*'HelloWorld'

'HelloWorldHelloWorldHelloWorld'

On the other hand, the **division** and **subtraction** operators are not defined for strings

## Reassignment operators

Consequently, the corresponding reassignment operators (**\*=** and **+=**) also work

In [16]:
my_string = 'My string'

In [17]:
my_string += ' complex'

In [18]:
my_string

'My string complex'

In [19]:
my_string *= 2

In [20]:
my_string

'My string complexMy string complex'

## Home operator

We can check the presence of a sub-string in a string by using the **membership operators** : **in** and **not in**

In [21]:
'in' in 'My string'

True

# Indicating

## By individual element

In [73]:
my_string = 'Hello world'

In [23]:
len(my_string)

11

We refer to a specific character in the string by using the [i] operator to obtain the (i - 1)th character of this string.

We say that i is the index of the character in this string.

The first character of this string has the index 0.

The last character of this string has index (N - 1) with N being the size of this string, that is to say len(my_string)

In [24]:
my_string[0]

'H'

In [25]:
my_string[1]

'e'

##### by the end

It is also possible to refer to a character in the string by counting from the end, for this we add a - (minus, dash or minus in English) in front of the index.

[-1] therefore refers to the last character of the string.

In [26]:
my_string[-1]

'd'

## Slice

A subpart of this string can be extracted using the **:** (colon, *colon*) operator.

**my_string[a:b]** is the string containing the numbered characters **[a, a + 1..., b- 2, b - 1]**.

This string will have by construction **(b - a) characters** of the string **my_string**, the first of which is at index **a**.

In [27]:
my_string

'Hello world'

In [28]:
my_string[6:10]

'worl'

##### Lack of bounds

If one of the two indices is omitted, it works as an absence of limit (**from the beginning** or **to the end**) 

In [29]:
my_string[:5]

'Hello'

In [30]:
my_string[5:]

' world'

##### By the end

In [31]:
my_string[-5:]

'world'

In [32]:
my_string[:-3]

'Hello wo'

The counting system is rather intuitive: 
* **string[-5:]** will take the last 5 characters
* **my_string[:-3]** will take the whole string except the last 3 characters

##### Step

A third parameter can be specified for the slice: the **step**.

If you specify this third parameter, you will not take all the characters, but only **1/n** elements. 

If this parameter is omitted, it is equal to 1.

In [34]:
my_string

'Hello world'

In [35]:
my_string[::2]

'Hlowrd'

In [36]:
my_string[::3]

'Hlwl'

In [37]:
my_string[::-1]

'dlrow olleH'

In [38]:
my_string[::-2]

'drwolH'

# Mutability

A string is an **immutable** object, i.e. **it can only be modified by redefining** it using an assignment or reassignment operator **__=__** or **\*=** or **\+=**

All the types studied so far are **immutable**: 

**int**, **float**, **complex**, **bool** cannot be modified, only reassigned.

The property of mutability or immutability of a type is essential.

# Escape and special characters

## Escape character

As in most languages, there is an escape character, the backslash character.

## Frame characters

Of course the enclosing operator (**apostrophes** or **quotes**) used cannot simply be included as a character in a string

In [None]:
'be careful when putting ' apostrophes in a string'.

You must precede the frame character used (**__'__** or **__"__**) with the **escape character** (backslash).

The backslash character**, given its status, also needs to be escaped within a string: **\\\\**

In [39]:
print('The frame character must be preceded by a \\ ')

The frame character must be preceded by a \ 


In [44]:
print("The same goes for inverted commas if that's what you're using in a string that \"contains\" them")

The same goes for inverted commas if that's what you're using in a string that "contains" them


## Special characters

In addition, there are two non-printed characters to know:
- **\n** : ***new line*** which means a new line
- **\t** : ***tabulation*** which means a tabulation space (larger than a normal space)

There is also a **\r** character, thanks to Microsoft, because in Windows, a simple line feed is supposed to be signified by the set of two **\r\n** characters (***carriage return - line feed***)

In [46]:
print('The line break is expressed as \n "new line"')

The line break is expressed as 
 "new line"


# Methods

A method is a function applied to an object. 

We won't go into detail here, but remember that to apply the **meth** method to the **obj** object, we use the **obj.meth()** statement.

## Subpart replacement of a string

To replace any sub-string of a string corresponding to a tested value with another string and return the result:

In [48]:
my_string.replace('H', 'B')

'Bello world'

The string itself has not been modified

In [50]:
my_string

'Hello world'

To modify a character string the only possibility is to reassign it

In [51]:
my_string = my_string.replace('e', 'a')

In [52]:
my_string

'Hallo world'

This function is also used to delete occurrences of a string

In [53]:
my_string.replace('a', '')

'Hllo world'

## Searching for and enumerating a subpart of a chain

Find the position of the **first character** of the **first occurrence** corresponding to the searched string

In [54]:
my_string.find('r')

8

Count the number of times a sub-string is present in a string


In [55]:
my_string.count('l')

3

## Cleaning

Remove the characters on the left and right (by default the **space** character)

In [56]:
'   123   '.strip()

'123'

The specific methods **lstrip()** and **rstrip()**, allow to remove characters only on the left (l) or on the right (r) 

In [57]:
my_string.rstrip('ld')

'Hallo wor'

## Separation and join

Split a string into pieces based on a sub-string and return a list of the different split parts.

If no parameter is specified the string will be split based on the **space** character.

In [58]:
my_string.split()

['Hallo', 'world']

In [60]:
my_string.split('o')

['Hall', ' w', 'rld']

Be careful, by construction, the string used is removed from the result, so it is not in any element of the list (here, there is no more e)

There is also a derivative of the split() method, the splitlines() method which allows you to split a text into a list of lines

In [61]:
colors_string = 'Blue\nRed\nGreen'
print(colors_string)

Blue
Red
Green


In [62]:
colors_string.splitlines()

['Blue', 'Red', 'Green']

The inverse function of the **split()** method is a bit surprising in its formulation, it allows to concatenate the elements of an array containing strings using a string

In [65]:
join_string = ' and '

In [66]:
join_string.join(['Blue', 'Red', 'Green'])

'Blue and Red and Green'

Or more directly:

In [67]:
' and '.join(['Blue', 'Red', 'Green'])

'Blue and Red and Green'

## Case

First, it is important to note that strings are case sensitive.

In [68]:
'a' == 'A'

False

To get the string in upper or lower case.

In [69]:
print(colors_string.lower())

blue
red
green


In [70]:
print(colors_string.upper())

BLUE
RED
GREEN


The following two methods can also be used to capitalise only certain letters and the rest in lower case.

In [71]:
print(colors_string.capitalize()) 

Blue
red
green


In [72]:
print(colors_string.title()) 

Blue
Red
Green


## Content testing

##### Start and end of string

In [74]:
my_string.endswith('Hello')

False

In [75]:
my_string.startswith('Hello')

True

##### Digital-only content

In [77]:
'12'.isnumeric()

True

Note that numbers with a decimal point do not fit into this framework:

In [78]:
'1.2'.isnumeric()

False

##### Alphabetical content only

In [81]:
'Dog'.isalpha()

True

In [82]:
'Dog-Cat'.isalpha()

False

In [83]:
'Dog Cat'.isalpha()

False

##### Content by case

In [84]:
'DOG'.isupper()

True

In [85]:
'dog'.islower()

True

# Formatting

There are four equivalent methods for formatting a string, from the most desirable to the least desirable:
1. Using the string method **format()**
1. Use f-strings (does not work on versions below Python 3.6)
1. Use the **__%__** formatting operator
1. Do string concatenations

For example, let's say we want to get the following string:
```Python

'Paul is 23 years old, he has a 13.26 average grade'.
```
From the data :


In [88]:
age = 23
mean = 13.256489
name = 'Paul'

## format() string method

The operation can be carried out by signifying the places where you wish to introduce the formatting with braces **{}**

In [90]:
'{} is {} years old, he had an average of {}'.format(name, age, mean)

'Paul is 23 years old, he had an average of 13.256489'

Within the brace, two optional parameters** can be specified:
1. The index of the element in the provided list (or the dictionary key), which is convenient when there are repetitions

In [93]:
'{0} is {1} years old, he had an average of {2}'.format(name, age, mean)

'Paul is 23 years old, he had an average of 13.256489'

2. Additional formatting specifications. This will mainly involve :
    - Specifying a number of digits after the decimal point (fixing the number of digits) using the formulation **.Nf** N being the number of digits    
    - Putting padding zeros in front of an integer using the formula **0Nd** N being the total number of digits of the integer

When this second parameter is specified, it is necessary to separate the two parameters with the operator **:** (colon)



In [94]:
'{:.2f}'.format(mean)

'13.26'

In [95]:
'{:04d}'.format(age)

'0023'

In [98]:
'{} is {} years old, he had an average of {:.2f}'.format(name, age, mean)

'Paul is 23 years old, he had an average of 13.26'

## F-strings

The f-strings are a recent creation (does not work on versions below Python 3.6) and very practical.

Some may find this representation a little less readable.

To define an f-string, you just have to put the variable names directly in the braces, instead of the specifying parameter.

Apart from that, the syntax is identical to that of the format() method.

In [99]:
f'{name} is {age} years old, he had an average of {mean:.2f}'

'Paul is 23 years old, he had an average of 13.26'

The templating system cannot work in this case. Formatting is **immediately applied**.

In [None]:
'%s a %d ans, il a eu %.2f de moyenne' % (nom, age, moyenne)