# Strings

Author: Mike Wood

Learning Outcomes:
By the end of this notebook, you should be able to:
1. Declare strings and access string chracters by their indices
2. Maniputate strings with built-in string methods
3. Format numbers as strings with specified round-off and/or padding

## Declaring Strings
Strings are declared in Python using either double quotes or single quotes - and it doesn't matter which! All that matters is that they are opened and closed with the same quote type.

In [1]:
# create a string enclosed in single quotes
first_name = 'Mike'

# create a string enclosed in double quotes
last_name = "Wood"

print(first_name, last_name)

Mike Wood


Strings can be concatenated using the `+` operator:

In [2]:
# concatenate the two strings above along with a space in between
print(first_name + ' ' + last_name)

Mike Wood


Unlike some other programming languages, Python data elements do not have an implicit toString() method. In other words, you must convert an object of a different type to a string before concatenation. Try it for your self:

In [3]:
# try to concatenate an integer to one of the strings defined above
# print(first_name + 1)
# what happens? how can you alleviate this error?

We can convert other objects to strings simply by wrapping them in `str`:

In [4]:
print(first_name + str(1))

Mike1


Just like lists, tuples, and sets, the character(s) at a given location(s) in a string can be accessed using square brackets:

In [5]:
# print the first letter of one of the words above
print('The first letter of ' + first_name +' is '+first_name[0]+'.')

The first letter of Mike is M.


Similarly, a subset of a string can be accessed in a similar way as for lists:

In [6]:
# print the second and third letters of one of the words aboe
print('The second and third letters of '+first_name+' are '+first_name[1:3]+'.')

The second and third letters of Mike are ik.


## Manipulating Strings
Beyond concatenation and accessing elements in a string, Python has many built-in methods for manipulating strings. The following table lists some of the common methods:
| Method	| Description                                                     |
| --------- | --------------------------------------------------------------- |
| split()   | Divides a string into a list of substrings with a given string  |
| join()    | Concatenates a list of strings into a single string using a specified character            |
| strip()	| Removes the whitespace on the left and right of a string        |
| count()   | Returns the number of substrings with the given value           |
| find()    | Rturns the index of the first occurances of the given substring |
| replace() | Replaces each instance of a substring with a given substring    |
| upper()   | Converts lowercase characters to uppercase characters           |
| lower()   | Converts uppercase characters to lowercase characters           |

### Splitting and joining
Often, it is helpful to split a string into a list, make some edits and then rejoin the string. The `split` and `join` methods have different approaches to syntax. The split method operates on the string and takes a substring as an argument as:
```
string.split(<substring>)
```
Conversely, the join method operates on the substring and takes a list of strings as an argument:
```
<substring>.join(list)
```
For example, consider a string divided across three lines that you would like to join into a single line:

In [7]:
# define a string written across three lines
three_line_str = 'This string is\nsplit across three\ndifferent lines.'
print(three_line_str)
print(' ')

# split the string at the line breaks
three_line_str = three_line_str.split('\n')

# print the resultant list from the previous command with an extra space
print(three_line_str)
print(' ')

# rejoin the string with spaces
three_line_str = ' '.join(three_line_str)

# print the resultant list from the previous command with an extra space
print(three_line_str)
print(' ')

This string is
split across three
different lines.
 
['This string is', 'split across three', 'different lines.']
 
This string is split across three different lines.
 


Most other string methods operate on the string itself (like the split command). For example:

In [8]:
# replace the string 'split' with 'no longer split' in the
# string resaulting from the previous cell
one_line_str = three_line_str.replace('split','no longer split')
print(one_line_str)

This string is no longer split across three different lines.


#### &#x1F914; Mini-Exercise
Goal: Convert the tab-delimited line to a comma-delimited line that could be opened in Excel. For example, consider the following tab-delimited string:

In [9]:
tab_line = 'City, State\tPopulation\nManhatten, NY\t1,000,000\nSan Jose, CA\t1,000,000'
print(tab_line)

City, State	Population
Manhatten, NY	1,000,000
San Jose, CA	1,000,000


As we can see above, the `\t` characters are tabs that organized the data in columns. In a csv file, columns are separated by columns - and this creates an issue with the values in the columns also have commas. To distinguish each column, we can split the line at the tabs, "wrap" each item in double quotes, and then join back to a single string. The double quotes will ensure proper formatting in Excel. Edit the tab line above so your output looks as follows:
```
"City, State","Population"
"Manhatten, NY","1,000,000"
"San Jose, CA","1,000,000"
```

In [10]:
# a tab delimited file is separated by tabs (\t)
tab_line = 'City, State\tPopulation\nManhatten, NY\t1,000,000\nSan Jose, CA\t1,000,000'
print(tab_line)
print(' ')

# split the string by tabs
tab_line = tab_line.split('\t')
print(tab_line)

# for each substring, add double quotes to the string on each end
# you can access each of the 4 substrings individually
tab_line[0] = '"'+tab_line[0]+'"'
tab_line[1] = '"'+tab_line[1]+'"'
tab_line[2] = '"'+tab_line[2]+'"'
tab_line[3] = '"'+tab_line[3]+'"'
print(tab_line)
print(' ')

# join the substring using commas
tab_line = ','.join(tab_line)
print(tab_line)
print(' ')

# wrap the next line escape characters with double quotes so each
# line starts and ends with a double quote
tab_line = tab_line.replace('\n','"\n"')
print(tab_line)


City, State	Population
Manhatten, NY	1,000,000
San Jose, CA	1,000,000
 
['City, State', 'Population\nManhatten, NY', '1,000,000\nSan Jose, CA', '1,000,000']
['"City, State"', '"Population\nManhatten, NY"', '"1,000,000\nSan Jose, CA"', '"1,000,000"']
 
"City, State","Population
Manhatten, NY","1,000,000
San Jose, CA","1,000,000"
 
"City, State","Population"
"Manhatten, NY","1,000,000"
"San Jose, CA","1,000,000"


## Formatting Numbers as Strings
### Floats with decimal places

Often, it is desireable to format numbers with strings. For example, Python, like all programming languages, is susceptible to round off errors:

In [11]:
squared_square_root = (3**0.5)**2
print(squared_square_root)

2.9999999999999996


To format float-type numbers with a prescribed number of decimal points, we can use a formatting notation similar to that in other C-type languages:
```
'{:.Xf}'.format(number)
```
In this approach, the X indicates the desired number of decimal points. For example:

In [12]:
# format the result above with 1 decimal place
print('If we take the square root of 3 and then square it, we get '+
      '{:.1f}'.format(squared_square_root))

If we take the square root of 3 and then square it, we get 3.0


### Padding Integers
When formatting integers as text, its often desireable to align numbers to easily compare values.

For example, conside the following numbers:

In [13]:
print(5)
print(1754)
print(20)
print(5467890)
print(90724)

5
1754
20
5467890
90724


It is easier to compare the magnitude of each values when they are padded with spaces. Here, let's pad each number with 7 spaces:

In [14]:
# pad the integers above with 7 decimal places
print('{:7d}'.format(5))
print('{:7d}'.format(1754))
print('{:7d}'.format(20))
print('{:7d}'.format(5467890))
print('{:7d}'.format(90724))

      5
   1754
     20
5467890
  90724


#### &#x1F914; Mini-Exercise
Goal: Use the format method to write the following list of numbers padded with room for 20 total digits and 5 digits after the decimal place.
```
5.2
1754.567
20.43
5467890.121
90724.6
```

To format float values, we use the syntax `'{:N.Mf}'.format(float_value)` where `N` is the total number of digits and `M` is the total number of decimal places. You output should show the numbers above aligned at their decimal places.

In [15]:
# enter code here
print('{:20.4f}'.format(5.2))
print('{:20.4f}'.format(1754.567))
print('{:20.4f}'.format(20.43))
print('{:20.4f}'.format(5467890.121))
print('{:20.4f}'.format(90724.6))

              5.2000
           1754.5670
             20.4300
        5467890.1210
          90724.6000
