# S06 Strings

Mit Patel. Version: 1.00 (August 2023)


***

# 6. Variables with characters: Strings

In addition to numeric data and logical data, we may be interested in storing in the computer's memory _strings_, such as names, addresses, and so on. This type of data is also called _ literal data_. Its English name is **strings**.

Python is a particularly versatile language when it comes to handling this type of data as we will see in this session. Although the aim of the course is to focus on numerical calculation using Python, it is advisable to have a minimum knowledge of how to work with _strings_.

## 6.1 Generating string variables

There are several ways to store a literal value in a variable:

- The first is assigning the content of a text to be written in quotation marks or apostrophes to the variable.

As an example, in the following program excerpt we assign two names to two variables and then write them using the `print` statement.
```Python
comp1 = 'ethanol'
comp2 = "water"
print ("{} and {} are miscible." format (comp1, comp2))
```

Copy it to the next cell, and then run it:

In [None]:
comp1 = "ethanol"
comp2 = "water"
print (" {} and {} are miscible." .format (comp1, comp2))

 ethanol and water are miscible.


In [None]:
comp1 = 'ethanol'
comp2 = "water"
print (f"{comp1} and {comp2} are miscible.")

ethanol and water are miscible.


- The second way to store text in a variable is to assign the result of the `input ()` function to a variable (as we have seen in previous sessions).

Example:
```Python
n_h2o = input ("How many water molecules are there?")
print ("There are {} water molecules." format (n_h2o))
```

Check it in the cell, running it twice, one entering a `5` and the other entering the word` five`, without quotes or apostrophes.

In [None]:
n_h2o = input ("How many water molecules are there? ")
print ("There are {} water molecules." .format (n_h2o))

How many water molecules are there?five
There are five water molecules.


In [None]:
n_h2o = input ("How many water molecules are there? ")
print (f"There are {n_h2o} water molecules.")

How many water molecules are there? 00
There are 00 water molecules.


It is clear that the `input ()` function always reads the user's response as if it were a string, so the contents of the `n_h2o` variable are a **text** and not a numeric value .

If we enter a value with numeric digits and want to operate with it, remember that we need to convert it using the `float ()` statement (to convert it to a real number) or `int ()` (to convert it to an integer ).

- Finally, we can assign a text string to a variable using the result of an operation involving at least one literal variable (we'll see later in this session).

### 6.1.1 Length of a string.

Given a variable that contains a string, we can find out its ** length ** (that is, the number of characters it contains) using the `len ()` function, which returns an integer value.

In the following code snippet we can see an example of using the `len` function.
```Python
phrase = input ("Enter a sentence:")
long = len (phrase)
print ("The input phrase is {} characters". format (long))
```
The integer value (type `int`) returned by the function is assigned to a variable which in this case has been called` long`.

Note that the blanks in the sentence also count as characters.

In [None]:
phrase = input ("Enter a sentence:")
long = len (phrase)
print ("The input phrase is {} characters" .format (long))

Enter a sentence:1321dshf aaew rhjafe rfafrf 
The input phrase is 28 characters


In [None]:
phrase = input ("Enter a sentence:")
long = len (phrase)
print (f"The input phrase is {len(phrase)} characters")

Enter a sentence:1321dshf aaew rhjafe rfafrf 
The input phrase is 28 characters


In [None]:
phrase = input ("Enter a sentence:")
long = len (phrase)
print (f"The input phrase is {long} characters")

Enter a sentence:79879 retrtff fa f dfdaf a
The input phrase is 26 characters


## 6.2 Specifying substrings of a string

We should see a string as an **ordered sequence of characters** in which each of them, **including spaces**, occupies a certain position, given that **the first character occupies the position with index 0**.

For example, in the string "` ireland` "each character occupies a position (the index is the number below):

> ```python
> ireland
> 0123456
> ```

Thus, for example, the `t` character is in the fourth place (position 3) and the` n` character is in the sixth (position 5) of the string.

If we want to get the value of a specific character in the string, **we need to indicate the index of its position in square brackets**.

     variable_name [index]
    
Check with the following program excerpt that what we just said is true.

> ```python
comp = "ireland"
print (comp [3])
print (comp [5])
```

In [None]:
comp = "ireland"
print (comp [3])
print (comp [5])

Try to predict the character written in each of the following _print_ instructions and check:
```Python
comp = "ireland"
print (comp [1])
print (comp [4])
print (comp [6])
```

In [None]:
comp = "ireland"
print (comp [1])
print (comp [4])
print (comp [6])

What if we stored the word "riboflavin" in the variable `comp`? How do we get the last "a"?

When the strings are very long, it is practical to specify the last position of the string with the value -1 of the index, the penultimate -2, and so on.

Try to predict the character produced by each of the following `print` statements and check it in the cell below:
```Python
comp = "riboflavin"
print (comp [-1])
print (comp [-3])
print (comp [-4])
```

In [None]:
comp = "riboflavin"
print (comp [-1])
print (comp [-3])
print (comp [-4])

In addition to individual characters, we can also obtain a _substring_ from an existing string variable using what is called a `slice` in English. This is to indicate the **name of the variable** and then, using square brackets, **specify the indices of the start and end position of the substring**, for example:

> ```python
> comp [i: j]
> ```

The above instruction provides the substring of the variable `comp` **starting with the index position` i` and ending with the index position `j-1`**. Please note that position `j` is not included in the result!

If the starting position is left blank, this indicates from the beginning of the string:

> ```python
> comp [: j] # substring from position 0 to position (j-1)
> ```

However, if the end position is left blank, the character string is obtained to the last position:

> ```python
> comp [i:] # substring of characters from position and end position
> ```

Note that in any case, the colon `:` is required to get a substring.

According to what we just discussed, what will be the character string produced by each of the `print` instructions in the following program snippet?
```Python
comp = "riboflavin"
print (comp [0: 3])
print (comp [3: 6])
print (comp [: 3])
print (comp [2:])
```

Once you're clear, check your prediction:

In [None]:
comp = "riboflavin"
print (comp [0: 3])
print (comp [3: 6])
print (comp [: 3])
print (comp [2:])

You can also specify a third index that corresponds to the increase to be taken into account in the indexes from the beginning (if we do not put it, it is assumed to be 1).

So we can write for example:
```Python
comp = "riboflavin"
print (comp [:: 2]) # From beginning to end 2 by 2
print (comp [1 :: 2]) # From the second element to the end of 2 in 2
```

In [None]:
comp = "riboflavin"
print (comp [:: 2]) # From beginning to end 2 by 2
print (comp [1 :: 2]) # From the second element to the end of 2 in 2

Using a -1 value as the third index of a slice makes it easy to invert a string.

Copy and paste this code into the following cell and check:
```Python
text = input ("Enter a text:")
text_inv = text [:: - 1]
print (text_inv)
```

In [None]:
text = input ("Enter a text:")
text_inv = text [:: - 1]
print (text_inv)

## 6.3 String operations

The sum or product operators can be applied to the strings, but they have a different result than the arithmetic operations:

- We can use the "` + `" operator to **concatenate** two strings. Concatenating two strings of text means joining them together.

Look at the following example and run it below:
```Python
txt1 = input ("Enter a text:")
txt2 = input ("And now another text:")
new_text = txt1 + txt2
print ("This is the result of concatenation: {}". format (new_text))
```

In [None]:
txt1 = input ("Enter a text:")
txt2 = input ("And now another text:")
new_text = txt1 + txt2
print ("This is the result of concatenation: {}". format (new_text))

The previous result when typing `new_text` was a bit ugly. The two texts have been joined without leaving any separation between them. If you want to leave a blank space in the middle, you can do so for example:
```Python
txt1 = input ("Enter a text:")
txt2 = input ("And now another text:")
new_text = txt1 + "" + txt2 # Note the blank in the middle
print ("This is the result of concatenation: {}". format (new_text))
```

In [None]:
txt1 = input ("Enter a text:")
txt2 = input ("And now another text:")
new_text = txt1 + "" + txt2 # Note the blank in the middle
print ("This is the result of concatenation: {}". format (new_text))

- We can use the `*` operator combined with an integer value to ** replicate ** the string.

In the following example we triple the content of the variable:
```Python
txt = input ("Enter a text:")
triplicate = 3 * txt
print (triplicate)
```
Check it out:

In [None]:
txt = input ("Enter a text:")
triplicate = 3 * txt
print (triplicate)

## 6.4 String methods.

### 6.4.1 Functions and methods

In Python we have already seen some ** functions **, such as those in the `math` module:

```python
import math
x = 0.5
y = math.sin (x)
z = math.log10 (y)
```

Each function has a name and its name is followed by possible **arguments** in parentheses.

```python
function_name (arguments)
```

The function calculates a result that can be used somewhere in the program that calls it, often assigning it to a variable


```python
nothing = function_name (arguments)
```

But Python also has so-called **methods** that have the same result, but are called differently.

Each type of data is usually associated with certain methods that make it easier to work with (for example, converting the contents of a string variable to uppercase or lowercase).

The syntax for running a method is as follows:

```python
variable_name.method_name (arguments)
```

which is usually used thus

```python
res = variable_name.method_name (arguments)
```

Note that **the variable to which the method is applied is written first and followed by a dot `.`, the method name, and the parentheses**. If there are arguments they are put in parentheses.

#### Capitalize a text: the `.upper ()` method

The **`.upper ()`** method converts all alphabetic characters in a _string_ to uppercase.

Check it out with this example:
```Python
txt = input ("Enter a text (in lower case):")
txt_maj = txt.upper ()
print (txt_maj)
```

In [None]:
txt = input ("Enter a text (in lower case):")
txt_maj = txt.upper ()
print (txt_maj)

#### Convert text to lowercase: `.lower ()` method

The **`.lower ()`** method converts all alphabetic characters from a _string_ to lowercase.

Check it out with this other example:
```Python
txt = input ("Enter a text (in uppercase):")
txt_min = txt.lower ()
print (txt_min)
```

In [None]:
txt = input ("Enter a text (in uppercase):")
txt_min = txt.lower ()
print (txt_min)

#### Counting occurrences in a text: the `.count ()` method

The **`.count ()`** method allows us to determine how many times a certain character or string appears within the string. The character or string to search for must be enclosed in double or single quotes.

Check it out with this code snippet:
```Python
txt = input ("Enter a text:")
n_a = txt.count ("a") # Count the "a" characters in the entered text
print (n_a)
```

In [None]:
txt = input ("Enter a text:")
n_a = txt.count ("a") # Count the "a" characters in the entered text
print (n_a)

#### Find in a text: the `.find ()` method

The **`.find ()`** method allows us to find out which position (starting with zero) occupies a certain character or string within the string. The character or string to search for must be enclosed in double or single quotes.

Note that if the character is repeated inside the string, the `.find ()` method will result in **the first place** where it is found.

And that **if it is not** in the string, it returns the value -1 (when it finds it it always returns the position that is a value greater than or equal to zero).


Observe the contents of the following code. What values do you think he will write?
```Python
txt1 = "now"
ipos = txt1.find ('a')
print (ipos)
txt2 = "Now"
ipos = txt2.find ('a')
print (ipos)
txt3 = "We are here now"
ipos = txt3.find ('a')
print (ipos)
txt4 = "We are here"
ipos = txt4.find ('a')
print (ipos)
```

Make your prediction and run it in the following cell:

In [None]:
txt1 = "now"
ipos = txt1.find ('a')
print (ipos)
txt2 = "Now"
ipos = txt2.find ('a')
print (ipos)
txt3 = "We are here now"
ipos = txt3.find ('a')
print (ipos)
txt4 = "We are here"
ipos = txt4.find ('a')
print (ipos)

If you want to determine the position of other occurrences of the character or string, you can do so by indicating after the text to find the position next to the one in which the first one was found.

You can see it by running this example in the following cell:
```Python
txt = "acetone: water: ethanol: chloroform"
ipos1 = txt.find (":") # The first position of the character : is determined in the text
print (ipos1)
ipos2 = txt.find (":", ipos1 + 1) # First position of : from next position to meeting before
print (ipos2)
```

In [None]:
txt = "acetone: water: ethanol: chloroform"
ipos1 = txt.find (":") # The first position of the character : is determined in the text
print (ipos1)
ipos2 = txt.find (":", ipos1 + 1) # First position of : from next position to meeting before
print (ipos2)

### 6.4.2 The `.split ()` method

A very useful method of strings that we will use later is `.split ()`. This method allows you to **separate** a string, that is, split it into several strings.

Using the .split () method without specifying anything in parentheses, the character string is segmented where it finds one or more whitespace.

Check it out with the following example:
```Python
txt = "Acetone and water are miscible"
list = txt.split ()
print (list)
```

In [None]:
txt = "Acetone and water are miscible"
list = txt.split ()
print (list)

The result is the phrase segmented and poured into a **list** of strings. The list is a new type of data (or data collection) that we will see in the next session.

However, we do not always have to separate a string by whitespace. We can separate by any character, for example by `:`.

Check it out with this example:
```Python
txt = "acetone:water:ethanol:chloroform"
list = txt.split (":")
print (list)
```

In [None]:
txt = "acetone:water:ethanol:chloroform"
list = txt.split (":")
print (list)

_Note_: Note that the separator character in the `.split ()` method **disappears** when the result is given (whites in the first example and whites: in the second).

## 6.5 Alternative structures and strings

Alternative structures can be applied to the strings in different ways.

For example, given two string variables we can ask ourselves if they match in value, that is, if they contain the same text.

Run the following code in the cell to see its result, once entering the same text twice and once entering different texts:
```Python
txt1 = input ("Enter a text:")
txt2 = input ("And now another text:")
if txt1 == txt2:
     print ("They're the same")
else:
     print ("They are different")
```

In [None]:
txt1 = input ("Enter a text:")
txt2 = input ("And now another text:")
if txt1 == txt2:
     print ("They're the same")
else:
     print ("They are different")

We may also wonder if a string contains a certain character or string.

Run the following code in the cell, once you enter a text with a comma and again without:
```Python
txt = input ("Enter a text:")
if ',' in txt:
     print ("There is a comma in the text")
else:
     print ("No comma in text")
```

In [None]:
txt = input ("Enter a text:")
if ',' in txt:
     print ("There is a comma in the text")
else:
     print ("No comma in text")

Note that the condition could also have been reversed with a `note` like this:
```Python
txt = input ("Enter a text:")
if ',' not in txt:
     print ("There is no comma in the text")
else:
     print ("There are no commas in the text")
```
Check it out:

In [None]:
txt = input ("Enter a text:")
if ',' not in txt:
     print ("There is no comma in the text")
else:
     print ("There are no commas in the text")

## 6.6 Repetitive structures and strings

A `for` structure can be used to parse or treat each of the characters in a string.

In the following example, each of the characters in a text is written on a separate line.
```Python
txt = input ("Enter a text:")
for carac in txt: # For each character in the text
     print (character) # The character is typed
```
Check it out:

In [None]:
txt = input ("Enter a text:")
for carac in txt: # For each character in the text
     print (character) # The character is typed

An alternative way to do the same is the following code, which uses an index pointing from the first character to the last:
```Python
txt = input ("Enter a text:")
long = len (txt) # We get the length of the entered text
for i in range (long): # For each index i = 0,1, ..., long-1
     print (txt [i]) # The character is typed
```

In [None]:
txt = input ("Enter a text:")
long = len (txt) # We get the length of the entered text
for i in range (long): # For each index i = 0,1, ..., long-1
     print (txt [i]) # The character is typed

It can be seen that in this case the first option is simpler.



***
