<a href="https://colab.research.google.com/github/SpoonyGato/data-and-python/blob/main/Copy_of_03_Working_with_strings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Working with Strings

---
### String data

One of the smallest file formats for storing data is CSV (Comma Separated Values).  A CSV file is a text file in which table data is organised in rows, with values separated by commas.  CSV files can be opened in spreadsheet software but are a fraction of the size of an Excel file.  

This makes them the ideal choice for storing data and transferring it over networks.

Learning to work with strings, finding substrings, capitalising, will have value when cleaning and transforming data.

---
## Creating string data and joining strings together

String variables are often used to hold names of things but also used to hold descriptions, messages, etc.  To assign string data to a variable, encase the text in ""  

`name = "Monty"`  
`description = "A large snake"`

We often need to join strings together.  This might be to add a prefix to a filename, create an identifier, etc.

Here is an example.  A function will add "Q1_" to the beginning of a `filename`.

```
new_filename = "q1_" + filename
```
So if `filename` is "inventory.xlsx" then `new_filename` will be "Q1_inventory.xlsx"

Here is another example.  A function has done something and there has been an error with a `code` and a `description`.  The function needs to show an error message, so it creates a message with the code and description included.

If the `code` is "404" and the `description` is "file not found", then the message can be generated like this:

`message = "Error " + code + " has occurred: " + description`

Now message will have the value "Error 404 has occurred: file not found"





---
### Exercise 1 input and strings

**Scenario**

A system is needed to help collect personal data.  YOu have been asked to create a function that will collect a first, middle and last name

**Recap**: to get input from the user and store it in a variable, use the instruction  

*variableName = input("message asking user for required information")*  

*Remember: Unless you state otherwise, all input is String data*  

**Your task**

Write a function called **show_details()** which will:

*  ask the user to input a **first_name**  
*  ask the user to input a **middle_name**  
*  ask the user to input a **surname**  
*  print the name in the format: `surname`, `first_name` `middle_name`  

**Test Inputs**:  
Peter  
Barry  
Burrows  

**Expected output**:  
Burrows, Peter Barry


In [2]:
def show_details():
  first_name = str(input("Please provide first name :"))
  middle_name = str(input("Please provide middle name :"))
  surname = str(input("Please provide surname :"))

  print(first_name + ", " + middle_name + " "  + surname)

show_details()

Please provide first name :Michal
Please provide middle name :J
Please provide surname :Fox
Michal J Fox


# Substrings

---

A String is a set of characters. Each String has a length and each character in the string has a position or index. Indexing starts at 0 so the first character of a String is at position 0. As Strings have varying lengths, the last character of the String is at position -1. To find any character in the String, count from 0 or, if you want to count backwards from the end, count back from -1.  

You can get a particular character, or a set of consecutive characters (substring) from a String using [] and the positions of the characters you want.  

Example:  
To get the first letter of a String (in this case the String is called **name**)   
**first_letter** = `name`[0]  
**second_letter** = `name`[1]  
**sixth_letter** = `name`[5]  

To get the last letter of name:  
**last_letter** = `name`[-1]  
**second_last_letter** = `name`[-2]  
**third_last_letter** = `name`[-3]  

Have a go

In [3]:
def show_letters():
   name = "Robert"
   # add your code below here to get a particular letter and print it.
   first_letter = name[0]
   second_letter = name[1]
   sixth_letter = name[5]

   last_letter = name[-1]
   second_last_letter = name[-2]
   third_last_letter = name[-3]

   print(first_letter + second_letter + sixth_letter)
   print(last_letter + second_last_letter + third_last_letter)



show_letters()

Rot
tre


To get a set of letters from a String, use the slice operator ":".  

*substring = stringname[ index of first letter to be included_ : index + 1 of last letter ]*

Example:  

**first_three_letters** = `name`[0:3]  
**second_to_fifth_letters** = `name`[1:5]  
**last_three_letters** = `name`[-3:]  

*Note: we can't add 1 to -1 as the result would be 0 and confused with the first letter, so we omit the second number to indicate that we want all characters to the end of the string*

In [4]:
def show_substrings():
   name = "Robert"
   # add your code below here to get a particular letter, or set of letters and print them.
   first_three_letter = name[0:3]
   second_to_fith_letters = name[1:5]
   last_three_letters = name[-3:]

   print(first_three_letter)
   print(second_to_fith_letters)
   print(last_three_letters)


show_substrings()

Rob
ober
ert


---
### Exercise 2 - substrings

Write a function called **show_substring()** which will:

Ask the user to input a **name** of at least 5 letters  
Print the **second** to the **fourth** letters in the `name`  

**Test Inputs**:  
Bartholomew  

**Expected output**:  
art


In [7]:
def show_substing():
  name = str(input("Please input name with minimum of 5 characters: "))
  second_to_forth = name[1:4]

  print(second_to_forth)

show_substing()

Please input name with minimum of 5 characters: Bartholomew
art


---
### Exercise 3  - Formatting String output  

Write a function called **show_formatted()** which will:  

*  ask the user to input a **house_number**
*  ask the user to input a **road_name**, then a **town**, then a **postcode**  
*  print the three lines of the address(`house_number` with a comma and `road_name`, `town`, `postcode`)  

**Test Input**:  
10  
Old Road  
Chatham  
ME4 1AA  

**Expected output**:   
10, Old Road  
Chatham  
ME4 1AA  

In [10]:
def show_formatted():
  house_number = int(input("Please put in house number: "))
  road_name = str(input("please put in road name: "))
  town_name = str(input("Please put in town name: "))
  postcode = str(input("Please put in postcode: "))

  print(str(house_number) + ", " + road_name)
  print(town_name)
  print(postcode)

show_formatted()

Please put in house number: 10
please put in road name: Old Road
Please put in town name: Chatham
Please put in postcode: ME4 1AA
10, Old Road
Chatham
ME4 1AA


# String functions

## String length

You can get the length of a String using the len() function   
e.g. **namelength** = len(`name`)  

The length function can be useful in finding strings that are too long to fit in output boxes, to check the length of a password to ensure that it fits the password rules.


---
### Exercise 4 - String length  

Write a function called **show_namelength()** which will:  

*  ask the user to input a **name**  
*  assign the length of the name to a variable called **name_length**  
*  print the `name` followed by a comma, then `name_length`, then the word “characters”.

**Test Input**:  
William  

**Expected Output**:  
William, 7 characters

In [12]:
def show_namelength():
  name = str(input("Please input a name: "))
  name_length = len(name)

  print(name + ", " + str(name_length) + " charachters")

show_namelength()

Please input a name: William
William, 7 charachters


## String Case Conversion  

When you start to compare and search for strings, you will need to be aware that data doesn’t always turn up in the state we would like it.  People often forget to use capital letters at the beginning of their names and will sometimes use all capitals.  It is useful, therefore, to be able to convert a String either to all capitals or all lowercase, depending on how you want to see it.  

There are three functions for this, which are linked to the String (here the String variable is called **name**).  

*   **upper_case_name** = `name.upper()`
*   **lower_case_name** = `name.lower()`
*   **capitalised_name** = `name.capitalize()`


---
### Exercise 5 - case conversion  

Write a function called **convert_to_capitals()** which will:  

*  ask the user to input a **name** in lowercase letters
*  assign the capitalised String to the variable **capitalised_name**
*  print the `name` in capitals  

**Test input**:  
jaswinder  

**Expected output**:  
JASWINDER  


In [15]:
def convert_to_capitals():
  name = str(input("Please input name in lowercase: "))
  capitalised_name = name.upper()
  print(capitalised_name)

convert_to_capitals()

Please input name in lowercase: jaswinder
JASWINDER


---
###Exercise 6 - case conversion  

Write a function called **capitalise_names()** which will:  

*  ask the user to input a **name** and a **surname**  
*  print both `name` and `surname` in lowercase with a capital letter at the start, even if they didn't have capitals when they were typed in  

**Test input**:  
benJamIN  

**Expected output**:  
Benjamin


In [16]:
def capitalise_name():
  name = str(input("Please input name: "))
  surname = str(input("Please input surname: "))

  lowwer_name = name.lower()
  lowwer_surname = surname.lower()

  capitalised_name = lowwer_name.capitalize()
  capitalised_surname = lowwer_surname.capitalize()

  print(capitalised_name + ", " + capitalised_surname)

capitalise_name()

Please input name: benJamIN
Please input surname: buTToN
Benjamin, Button


---
### Exercise 7 - Substrings

Write a function called **show_postcode_letters()** which will:   

*  ask the user to enter a **postcode**  
*  assign the first two letters of the postcode to a variable called **postcode_area**  
*  convert the `postcode_area` String to capital letters and assign this to a variable called **capitalised_area**  
*  print the `capitalised_area`  

**Test input**:  
Me4 6bb  

**Expected output**:  
ME


In [19]:
def show_postcode_letters():
  postcode = str(input("Please input postcode"))
  postcode_area = postcode[0:2]
  capitalised_area = postcode_area.upper()

  print(capitalised_area)

show_postcode_letters()

Please input postcodeMe4 6bb
ME


---
### Exercise 8 - Floor division and slicing

Write a function called **show_half_word()** which will:

*  ask the user to enter a **word**  
*  assign the value of half the length of the `word` to a variable called **half_length**  
*  assign the first half of the `word` to a new variable called **half_word**  
*  print `half_word`  

*Hint:  when dividing the length of the word by 2, use floor division (//) so that you get a whole number of letters).  To get the first half of the word, use word[0:halfLength]*  

**Test input**:  
Runtime  

**Expected output**:  
Run

In [20]:
def show_half_word():
  word = str(input("Please enter a word: "))
  half_length = len(word) // 2
  half_word = word[0:half_length]

  print(half_word)

show_half_word()

Please enter a word: Runtime
Run


---
### Exercise 9 - String repetition

Write a function called **repeat_two_letters()** which will:

*  ask the user to enter a **word**  
*  assign the last two letters of the `word` to a variable called **last_two**  
*  assign a string made of 5 copies of the `last_two` (*Hint: `last_two`* * 5 *will do it)* to a variable called **five_copies**  
*  print `five_copies`

**Test input**:
Data

**Expected output**:
tatatatata


In [26]:
def repeat_two_letters():
  word = str(input("Please input a word: "))
  last_two = word[-1] + word[-2]
  five_copies = last_two * 5

  print(five_copies)

repeat_two_letters()

Please input a word: Data
atatatatat


---
### Exercise 10 - converting String to lower case

Write a function called **convert_to_lower_threeLetters()** which will:  

*  assign the value “january” to a variable called **month**  
*  assign the first three characters of `month` to a variable called **month_short**   
*  capitalise `month_short` and store the result back in `month_short`
*  assign the rest of the characters to a variable called **month_rest**  
*  join `month_short` and `month_rest` together and store the result back into the variable `month`  
*  print `month`  

**Expected output**:  
JANuary  

In [28]:
def convert_to_lower_threeLetters():
  month = str("january")
  month_short = month[0:3]
  month_short = month_short.upper()
  month_rest = month[3:len(month)]

  month = month_short+month_rest

  print(month)

convert_to_lower_threeLetters()

JANuary


---
# Takeaways from this worksheet

* text is stored in "string" variables
* a string is a list of characters, with positions 0 to (length of string = 1)
* you can get an character from a string using its position either from the beginning, e.g. name[0], or from the end, e.g. name[-1]
* you can join two strings together using the + operator
* you can extract a substring using "slicing" e.g. name[3:5]
* you can find the length of a string using len()
* you can change the case of the letters in a string using `.upper()`, `.lower()` and `capitalize()`

Data sets often have columns of text data.  It is very common to need to standardise that text data by changing the case, adding prefixes or suffixes, or maybe by splitting off part of the string.







# Your thoughts on what you have learnt  

Please add some comments in the box below to reflect on what you have learnt through completing this worksheet, and any problems you encountered while doing so.


It seemed straight forward enough, it follows a simmilar logic to the last excercise. The parts I had an issue with was wrapping my head aroung [0:3], despite going all the way to the 4th character, it only includes the first 3. I also couldnt figure out how to get a set of characters from [-1:-3].