<a href="https://colab.research.google.com/github/Hatim-Bhavnagar/Data-with-Python/blob/main/03_Working_with_strings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Working with Strings

---
### String data

One of the smallest file formats for storing data is CSV (Comma Separated Values).  A CSV file is a text file in which table data is organised in rows, with values separated by commas.  CSV files can be opened in spreadsheet software but are a fraction of the size of an Excel file.  

This makes them the ideal choice for storing data and transferring it over networks.

Learning to work with strings, finding substrings, capitalising, will have value when cleaning and transforming data.

---
## Creating string data and joining strings together

String variables are often used to hold names of things but also used to hold descriptions, messages, etc.  To assign string data to a variable, encase the text in ""  

`name = "Monty"`  
`description = "A large snake"`

We often need to join strings together.  This might be to add a prefix to a filename, create an identifier, etc.

Here is an example.  A function will add "Q1_" to the beginning of a `filename`.

```
new_filename = "q1_" + filename
```
So if `filename` is "inventory.xlsx" then `new_filename` will be "Q1_inventory.xlsx"

Here is another example.  A function has done something and there has been an error with a `code` and a `description`.  The function needs to show an error message, so it creates a message with the code and description included.

If the `code` is "404" and the `description` is "file not found", then the message can be generated like this:

`message = "Error " + code + " has occurred: " + description`

Now message will have the value "Error 404 has occurred: file not found"





---
### Exercise 1 input and strings

**Scenario**

A system is needed to help collect personal data.  YOu have been asked to create a function that will collect a first, middle and last name

**Recap**: to get input from the user and store it in a variable, use the instruction  

*variableName = input("message asking user for required information")*  

*Remember: Unless you state otherwise, all input is String data*  

**Your task**

Write a function called **show_details()** which will:

*  ask the user to input a **first_name**  
*  ask the user to input a **middle_name**  
*  ask the user to input a **surname**  
*  print the name in the format: `surname`, `first_name` `middle_name`  

**Test Inputs**:  
Peter  
Barry  
Burrows  

**Expected output**:  
Burrows, Peter Barry


In [1]:
def show_details():
   # add your code below here to ask the user to input a first, middle and a last name and print them in the correct format.
   first_name = input("Please enter your first name: ")
   middle_name = input("Please enter your middle name: ")
   surname = input("Please enter your surname: ")
   print(surname + ", " + first_name + " " + middle_name)

show_details()

Please enter your first name: Peter
Please enter your middle name: Barry
Please enter your surname: Burrows
Burrows, Peter Barry


# Substrings

---

A String is a set of characters. Each String has a length and each character in the string has a position or index. Indexing starts at 0 so the first character of a String is at position 0. As Strings have varying lengths, the last character of the String is at position -1. To find any character in the String, count from 0 or, if you want to count backwards from the end, count back from -1.  

You can get a particular character, or a set of consecutive characters (substring) from a String using [] and the positions of the characters you want.  

Example:  
To get the first letter of a String (in this case the String is called **name**)   
**first_letter** = `name`[0]  
**second_letter** = `name`[1]  
**sixth_letter** = `name`[5]  

To get the last letter of name:  
**last_letter** = `name`[-1]  
**second_last_letter** = `name`[-2]  
**third_last_letter** = `name`[-3]  

Have a go

In [3]:
def show_letters():
   name = "Robert"
   # How to get a particular letter from string and print it.
   first_letter = name[0]
   print("First Letter :" + first_letter)
   second_letter = name[1]
   print("Second Letter :" + second_letter)
   sixth_letter = name[5]
   print("Sixth Letter :" + sixth_letter)
   last_letter = name[-1]
   print("Last Letter :" + last_letter)
   second_last_letter = name[-2]
   print("Second Last Letter :" + second_last_letter)
   third_last_letter = name[-3]
   print("Third Last Letter :" + third_last_letter)

show_letters()

First Letter :R
Second Letter :o
Sixth Letter :t
Last Letter :t
Second Last Letter :r
Third Last Letter :e


To get a set of letters from a String, use the slice operator ":".  

*substring = stringname[ index of first letter to be included_ : index + 1 of last letter ]*

Example:  

**first_three_letters** = `name`[0:3]  
**second_to_fifth_letters** = `name`[1:5]  
**last_three_letters** = `name`[-3:]  

*Note: we can't add 1 to -1 as the result would be 0 and confused with the first letter, so we omit the second number to indicate that we want all characters to the end of the string*

In [4]:
def show_substrings():
  # How to get a particular letter, or set of letters from string and print them.
   name = "Robert"
   first_three_letters = name[0:3]
   print("First Three Letters :" + first_three_letters)
   second_to_fifth_letters = name[1:5]
   print("Second to Fifth Letters :" + second_to_fifth_letters)
   last_three_letters = name[-3:]
   print("Last Three Letters :" + last_three_letters)

show_substrings()

First Three Letters :Rob
Second to Fifth Letters :ober
Last Three Letters :ert


---
### Exercise 2 - substrings

Write a function called **show_substring()** which will:

Ask the user to input a **name** of at least 5 letters  
Print the **second** to the **fourth** letters in the `name`  

**Test Inputs**:  
Bartholomew  

**Expected output**:  
art


In [5]:
def show_substring():
   # How to get a substring from string
   name  = input("Please enter a name of at least 5 letters: ")
   second_to_fourth_letters = name[1:4]
   print("Second to Fourth Letters :" + second_to_fourth_letters)

show_substring()

Please enter a name of at least 5 letters: Bartholomew
Second to Fourth Letters :art


---
### Exercise 3  - Formatting String output  

Write a function called **show_formatted()** which will:  

*  ask the user to input a **house_number**
*  ask the user to input a **road_name**, then a **town**, then a **postcode**  
*  print the three lines of the address(`house_number` with a comma and `road_name`, `town`, `postcode`)  

**Test Input**:  
10  
Old Road  
Chatham  
ME4 1AA  

**Expected output**:   
10, Old Road  
Chatham  
ME4 1AA  

In [7]:
def show_formatted():
   # How to get the house number, road name, town and postcode and print them formatted.
   house_number = input("Please enter your house number: ")
   road_name = input("Please enter your road name: ")
   town = input("Please enter your town: ")
   postcode = input("Please enter your postcode: ")
   print()
   print(house_number + ", " + road_name)
   print(town)
   print(postcode)

show_formatted()

Please enter your house number: 10
Please enter your road name: Old Road
Please enter your town: Chatham
Please enter your postcode: ME4 1AA

10, Old Road
Chatham
ME4 1AA


# String functions

## String length

You can get the length of a String using the len() function   
e.g. **namelength** = len(`name`)  

The length function can be useful in finding strings that are too long to fit in output boxes, to check the length of a password to ensure that it fits the password rules.


---
### Exercise 4 - String length  

Write a function called **show_namelength()** which will:  

*  ask the user to input a **name**  
*  assign the length of the name to a variable called **name_length**  
*  print the `name` followed by a comma, then `name_length`, then the word “characters”.

**Test Input**:  
William  

**Expected Output**:  
William, 7 characters

In [9]:
def show_namelength():
   #How to get the name and then print the name followed by then the length/ no of characters
   name = input("Please enter your name: ")
   name_length = len(name)
   print(name + " has, " + str(name_length) + " characters")

show_namelength()


Please enter your name: William
William has 7 characters


## String Case Conversion  

When you start to compare and search for strings, you will need to be aware that data doesn’t always turn up in the state we would like it.  People often forget to use capital letters at the beginning of their names and will sometimes use all capitals.  It is useful, therefore, to be able to convert a String either to all capitals or all lowercase, depending on how you want to see it.  

There are three functions for this, which are linked to the String (here the String variable is called **name**).  

*   **upper_case_name** = `name.upper()`
*   **lower_case_name** = `name.lower()`
*   **capitalised_name** = `name.capitalize()`


---
### Exercise 5 - case conversion  

Write a function called **convert_to_capitals()** which will:  

*  ask the user to input a **name** in lowercase letters
*  assign the capitalised String to the variable **capitalised_name**
*  print the `name` in capitals  

**Test input**:  
jaswinder  

**Expected output**:  
JASWINDER  


In [14]:
def convert_to_capitals():
   # How to convert to uppercase the string and print it
   name = input("Please enter your name in lowercase: ")
   capitalised_name = name.upper()
   print(capitalised_name)

convert_to_capitals()

Please enter your name in lowercase: Jaswinder
JASWINDER


---
###Exercise 6 - case conversion  

Write a function called **capitalise_names()** which will:  

*  ask the user to input a **name** and a **surname**  
*  print both `name` and `surname` in lowercase with a capital letter at the start, even if they didn't have capitals when they were typed in  

**Test input**:  
benJamIN  

**Expected output**:  
Benjamin


In [13]:
def capitalise_names():
   # How to capitalise the name and surname and print it
   name = input("Please enter your name: ")
   surname = input("Please enter your surname: ")
   capitalised_name = name.capitalize()
   capitalised_surname = surname.capitalize()
   print(capitalised_name + " " + capitalised_surname)

capitalise_names()

Please enter your name: mArk
Please enter your surname: benJamIN
Mark Benjamin


---
### Exercise 7 - Substrings

Write a function called **show_postcode_letters()** which will:   

*  ask the user to enter a **postcode**  
*  assign the first two letters of the postcode to a variable called **postcode_area**  
*  convert the `postcode_area` String to capital letters and assign this to a variable called **capitalised_area**  
*  print the `capitalised_area`  

**Test input**:  
Me4 6bb  

**Expected output**:  
ME


In [15]:
def show_postcode_letters():
   # How to get the first two letters of the postcode and convert them to uppercase and print
   postcode = input("Please enter your postcode: ")
   postcode_area = postcode[0:2]
   capitalised_area = postcode_area.upper()
   print(capitalised_area)

show_postcode_letters()

Please enter your postcode: Me4 6bb
ME


---
### Exercise 8 - Floor division and slicing

Write a function called **show_half_word()** which will:

*  ask the user to enter a **word**  
*  assign the value of half the length of the `word` to a variable called **half_length**  
*  assign the first half of the `word` to a new variable called **half_word**  
*  print `half_word`  

*Hint:  when dividing the length of the word by 2, use floor division (//) so that you get a whole number of letters).  To get the first half of the word, use word[0:halfLength]*  

**Test input**:  
Runtime  

**Expected output**:  
Run

In [18]:
def show_half_word():
   # How to get the first half of the word and print it
   word = input("Please enter a word: ")
   half_length = len(word) // 2
   half_word = word[0:half_length]
   print(half_word)

show_half_word()

Please enter a word: Runtime
Run


---
### Exercise 9 - String repetition

Write a function called **repeat_two_letters()** which will:

*  ask the user to enter a **word**  
*  assign the last two letters of the `word` to a variable called **last_two**  
*  assign a string made of 5 copies of the `last_two` (*Hint: `last_two`* * 5 *will do it)* to a variable called **five_copies**  
*  print `five_copies`

**Test input**:
Data

**Expected output**:
tatatatata


In [19]:
def repeat_two_letters():
   # How to get the last two letters of the word and repeat them 5 times and print
   word = input("Please enter a word: ")
   last_two = word[-2:]
   five_copies = last_two * 5
   print(five_copies)

repeat_two_letters()

Please enter a word: Data
tatatatata


---
### Exercise 10 - converting String to mixed case

Write a function called **convert_to_upper_threeLetters()** which will:  

*  assign the value “january” to a variable called **month**  
*  assign the first three characters of `month` to a variable called **month_short**   
*  convert `month_short` to upper case and store the result back in `month_short`
*  assign the rest of the characters to a variable called **month_rest**  
*  join `month_short` and `month_rest` together and store the result back into the variable `month`  
*  print `month`  

**Expected output**:  
JANuary  

In [20]:
def convert_to_upper_threeLetters():
   # How to convert the first three letters of the month to uppercase and print
   month = "january"
   month_short = month[0:3]
   month_short = month_short.upper()
   month_rest = month[3:]
   month = month_short + month_rest
   print(month)

convert_to_upper_threeLetters()


JANuary


---
# Takeaways from this worksheet

* text is stored in "string" variables
* a string is a list of characters, with positions 0 to (length of string = 1)
* you can get an character from a string using its position either from the beginning, e.g. name[0], or from the end, e.g. name[-1]
* you can join two strings together using the + operator
* you can extract a substring using "slicing" e.g. name[3:5]
* you can find the length of a string using len()
* you can change the case of the letters in a string using `.upper()`, `.lower()` and `capitalize()`

Data sets often have columns of text data.  It is very common to need to standardise that text data by changing the case, adding prefixes or suffixes, or maybe by splitting off part of the string.







# Your thoughts on what you have learnt  

Please add some comments in the box below to reflect on what you have learnt through completing this worksheet, and any problems you encountered while doing so.


I have learned various functions and use of operators for handling and extracting substring from strings