# Tutorial: Editing strings

Sometimes you have to take a string and edit it. Strings are IMMUTABLE - that means, you can't edit them in place. So all editing looks like copying over the pieces you want to keep and making a new string.

There are a TON of existing routines for editing and searching and taking apart strings:
 - Upper/lower case, is it a letter, number or a white space, take out special characters
 - Find a character or a substring in a string, does the string contain these characters?
 - Split up a string into a list of strings
 
See https://docs.python.org/2.5/lib/string-methods.html

This tutorial won't cover all of them, just some of the most useful. In general, there will not be a single command to "Remove the apostrophe's and the dashes from the names, make sure they're all lower case, and merge first and middle names". Instead, you have to figure out which 2-3 of the commands get you close (strip, split, find, tolower) and then apply them one after the other (eg, convert everything to lower case, then strip out the dashes, then remove the spaces) with some logic (if 2 strings after split versus 3).

## Example: Editing a string
Goal: convert "SOUNDS LIKE I'M YELLING!" to "Sounds like I'm NOT yelling"

Problem break down:
- First do the conversion to lower case
- Next, capitalize the first letter (there's a function for that)
- Next, fix the i'm by doing a replace
- Next, stick in a "not"
- Next, take the ! out

Obviously, if I knew what the sentence was it would be a lot easier to just, well, make the new sentence... But imagine, instead, you're writing a general purpose routine to convert all-caps strings to lower-case ones and fix grammar errors...

TODO
- Try taking out all of the spaces

In [None]:
# An entirely upper-case string
my_string_upper_case = "SOUNDS LIKE I'M YELLING!"

In [None]:
# convert to lower case - lower makes a copy of the string, converts the upper case letters, then returns the copy
my_string_lower_case = my_string_upper_case.lower()

# Note that my_string_upper_case hasn't changed...
print(f"This is yelling: {my_string_upper_case}, this is not: {my_string_lower_case}")

In [None]:
# Oops, should capitalize first letter - note capitalize, and not upper:
my_string_cap = my_string_lower_case.capitalize()
print(my_string_cap)

In [None]:
# ... and replace the lower-case i with an upper case
#  Notice the use of "" so that we can use ' as a character
my_string_with_i_fixed = my_string_cap.replace("i'm", "I'm")
print(my_string_with_i_fixed)

In [None]:
# add in the "not" by finding the last space
index_last_space = my_string_with_i_fixed.rfind(" ")
# List indexing to get out the left and right half
#   0:index gets 0, through index -1
#   index: gets from index through end
my_not_yelling_string = my_string_with_i_fixed[0:index_last_space] + " not" + my_string_with_i_fixed[index_last_space:]
print(my_not_yelling_string)

In [None]:
# ... without the ! at the end - use :-1 to only copy up to the last element
my_not_yelling_string_better = my_string_with_i_fixed[0:index_last_space] + " not" + my_string_with_i_fixed[index_last_space:-1]
print(my_not_yelling_string_better)

... Yes, you can do all of that editing in one line of really confusing Python. Just don't. BUT, you might try combining two steps at the same time...

## GOTCHA - Common error when editing a string: Can't edit in place
You can't directly edit a string. You'll get a very confusing error message that says
 TypeError: 'str' object does not support item assignment

Translate that as: You can't assign to an individual character in a string

In [None]:
my_not_yelling_string[1] = 's'

# Tutorial: Splitting up strings
Often when you read in strings from files (or people type in strings) you'll get strings that you have to "split up",
for example, by getting all the numbers separated by commas OR ignoring a descriptor at the beginning. 

For these examples we'll have pre-defined strings that are "typed in" (rather than reading them from a file) just to keep things simple. In reality, you usually have to do this when you can't edit the files directly.

An aside: When you only have one file you're "cleaning up" for processing (as opposed to writing some code to process a whole bunch of similar files) it is often a lot easier to open the file in your favorite spreadsheet editor and just... edit it there. 

TODO:
- Create a string with semi-colons separating the values
- Use split to pull it apart into a list of numbers

## Tutorial: Split example
This first bit of code is just to make the string to play with.

In [None]:
# Make a list of numbers
list_of_numbers = [0.3 + 0.1 * n for n in range(0, 10)]
# Put a random number in the middle
list_of_numbers.insert(3, "3.3e-3")

# Turn it into a string with a tab at the beginning (the \t)
str_list_of_numbers = "\t" + ", ".join(["{:.2}"] * len(list_of_numbers)).format(*list_of_numbers)
print(str_list_of_numbers)

In [None]:
# Take it back apart again using split, looking for commans
my_list_back_as_strs = str_list_of_numbers.split(',')

# Convert back to a list of numbers - note that float() is smart enough to take out white space
recover_list = [float(item) for item in my_list_back_as_strs]
print(recover_list)

In [None]:
# if that is confusing to you, this is the same code
recover_list = []
for item in my_list_back_as_strs:
    recover_list.append(float(item))
print(recover_list)    

## Tutorial: Find example
We have a list of strings, which consist of a descriptive name followed by a colon followed by a number
 - Find the end of the line (the \n) then split up each line.
 - If the line has a : then get out the number, otherwise, ignore it
 - Note that there is a more complete example of this (for building a dictionary) in the dictionary tutorial
 
TODO
- Try printing out which item it is when you find one
- To test it, reverse the order in the string and see if it still works

In [None]:
my_str_to_parse = "# This is a header\nItem 1: 0.3\nBlank line\nItem 2: 1.3\n"
print(my_str_to_parse)

In [None]:
# First, split the line into strings separated by \n
my_lines = my_str_to_parse.split('\n')
print(my_lines)

In [None]:
# An empty list of numbers
my_numbers = []
# Now loop through each line and look for a colon
for l in my_lines:
    # You can use this notation to see if the string ":" is in the string "l"
    if ":" in l:
        # split again - you could also use find and the resulting index
        two_parts = l.split(":")
        # Don't forget to convert the string to the number
        #.  the [-1] is to get the last (second) element in the list
        the_number = float(two_parts[-1])
        my_numbers.append(the_number)
print(my_numbers)

## Tutorial: Editing a file name
This is pretty common - take the last 4 characters off and replace them with a different file type. Try changing .txt to .py and edit the second line to do the right thing.

In [None]:
file_name_orig = "my_file.txt"
file_name_new = file_name_orig[0:-4] + ".csv"
print(file_name_new)