## The String Data Type

Text is represented in programs as the _string_ data type. A string can be thought of as a sequence of characters in quotation marks. Strings can be saved in variables just as any other data type.

```python
str1 = "Hello"
str2 = 'spam'
```

We can do many operations on strings. Through the operation of _indexing_ we can accesss individual characters that make up a string. We can think of the positions in a string as being numbered, starting from the left with 0. The general form for indexing is :

```python
string[index]
```
Python also allows indexing from the right end of a string using negative indexes:

```python
string[-1]
```

It's possible to access a sequence of characters or _substring_ from a string. This is called _slicing_. Slicing is a way of indexing through a range of positions in the string. The general form for slicing is:

```python
string[start:end]
```

A slice produces the substring starting as the position given by start and goes up to but _does not include_ the end position. If either the start or end position are missing, the start or end of the string are assumed by default. 

The string data types supports operations for putting strings together. Concatenation is the joining or gluing of two strings together with the (+) operand. Repetition is used with (" * ") to repeat a string with multiple concatenations of itself. Another useful function is len, which tells us how many characters are in the string. 

_____

## Simple String Processing

We're going to use the operations we just learned to createe a program to generate usernames from a first and last name input.

```python

def main():
    print('This program generates computer usernames.\n')
    
    # get user's first and last names
    first = input('Please enter your first name (all lowercase): ')
    last = input('Please enter your last name (all lowercase): ')
    
    # concatenate first initial with 7 chars of the last name
    uname = first[0] + last[:7]
    
    print('Your username is: %s', % (uname))
```

The program first uses input to get strings from the user. Then indexing , slicing and concatentation are combined to produce the username.

Now let's create a program that will print the abbreviations of the month that corresponds to a given month number. For example, inputting the number 3 will return Mar. We can store all the months into one big string:

```python
months = "JanFebMarAprMayJunJulAugSepOctNovDec"
```

We can now look up particular months by slicing out the appropriate substring. The position of each of the months in the string are multiples of 3. If we just subtract 1 from the user input and multiply by 3 we'll get the correct position of the month.

```python
#month.py
# A program to print the abbreviations of a month, given its number 

def main():
    months = "JanFebMarAprMayJunJulAugSepOctNovDec"
    n = int(input('Enter a month number (1-12): '))
    pos = (n-1) * 3
    monthAbbrev = months[pos:pos+3]
    print('The month abbreviation is %s' % (monthAbbrev))
```

_____

## Lists as Sequences

The operations that apply to strings can also be applied to lists because lists are also sequences. They are able to be concatenated, repeated, indexed, sliced. Lists are more general than strings. Strings are always a sequence of characters, whereas lists can be a sequence of arbitrary objects. You can create a list than contains multiple data types. Using a list of strings we can simplify the month.py program.

```python

def main():
    months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
    n = int(input('Enter a month number (1-12): '))
    print('The month abbreviation is %s.' % (months[n-1]))
```

Lists just like strings are indexed starting at 0. While strings and lists are both sequences, lists are mutable while strings are not. This means that the value of an item in a list can be modified with an assignment statement. Strings on the other hand cannot be modified "in place".


```python
myList = [34,26,15,10]

myList[2]
15

myList[2] = 0

myList 
[34,26,0,10]
```

We create a list containing four numbers. Line two indexes the list at position 2 to return the item 15 and we then reassigned this item to 0. After the assignment, we evaluate the list to see that the item at position 2 has been changed. 

_____

## String formatting

The format method is a built in for Python strings. The idea is that the string serves as a sort of template, and values supplied as parameters are plugged into this template to form a new string. String formatting takes the form:

```python
template-string.format(values)
```

The curly braces ( {} ) inside the template string mark 'slots' into which the provided values are inserted. The information inside the curly braces tells which value goes in the slot and how the value should be formatted. The slot descriptions will always have the form:

```python
{index:format-specifier}
```

The index tells which of the parameters is inserted into the slot. _As of Python 3.1 the index portion of the slot description is optional. When the indexes are omitted, the parameters are just filled into the slots in a left-to-right fashion_. 

```python
"Hello {0} {1}, you may have won {2}.".format('Mr.','Smith',10000)
--> Hello Mr.Smith you may have won 10000.
```


_____

## File Processing

One critical feature of any word processing program is the ability to store and retrieve documents as files on disks. 


### Multi-line Strings

Conceptually, a file is a sequence of data that is stored in secondary memory. Usually these can contains multiple lines, multiple data types. We can think of files as a string that happens to be stored on a disk. Typically files contain more than one line and Python has a special character to denote new lines, \n. The new line character is a special end-of-line marker and there are many of them. 

### File Processing

To being processing a file we need to first associate a file on disk with an object in a program. This is called _opening_ a file. Once the file has been opened the contents can be accessed through the associated file object. Second, we need a set of operations that can manipulate the file object. This includes operations to help us read the information from a file and write new information to files. Finally, when we are done with the file we need to close it.

Working with text files is easy in Python. The first step is to create a file object corresponding to a file on disk.

```python
variable = open(filename,mode)
```
Name is a string that provides the name of the file on disk or path to the file. The mode parameter is a string and is either "r" or "w" depending on whether we intend to read from the file or write to the file.

Python provides three related operations for reading information from a file:

file.read() Returns the entire remaining contents of the file as a single string.

file.readline() Returns the next line of the file. That is, all text up to and including the next newline character.

file.readlines() Returns a list of the remaining lines in the file. Each list item is a single line including the newline character at the end. 

Here's an example of a program that prints the contents of a file using the read operation:

```python 
#printfile.py
#prints a file to the screen

def main():
    fname = input('Enter filename: ')
    infile = open(fname,"r")
    data = infile.read()
    print(data)
    
```
To loop through the lines of a file:

```python

infile = open(someFile,"r")
for line in infile:
    #process lines
infile.close()
```
Opening a file for writing prepares the file to recieve data. If no file with the given name exists, a new file will be created. However, if a file with the given name _does_ exists Python will delete it and create a new, empty file. One way to write information into a text file is using the print function.

```python
print(....,file=outfile)
```
This behaves like the regular print function but instead that the result is sent to the outfile rather than the screen.

### Example Problem: Batch Usernames

To put everything together we are going to redo the username generation program. Instead of taking user input for the first and last name, we're going to load a batch of names from a file and create many usernames by looping through the file. Each line of the input file with contain the first and last name separate by a space or two. 

```python
#user.py
# Program to create a file of usernames in batch mode.

def main():
    
    print('This program creates a file of usernames from a file of names.')
    
    #get the file names
    infileName = input('What file are the names in?' )
    outfileName = input('What file should the usernames go in? ')
    
    # open the files
    infile = open(infileName,"r")
    outfile = open(outfileName,"w")
    
    #process each line of the input file
    for line in infile:
        #get the first and last name
        first,last = line.split()
        #create the username
        uname = (first[0] + last[:7]).lower()
        #write it to the output file
        print(uname,file=outfile)
    
    #close both files
    infile.close()
    outfile.close()
```

                        

## Chapter Summary

* Strings are a sequence of characters. String literals can be delimited with either single or double quotes.

* Strings and lists can be manipulated with the built-in sequence operations for concatenation (+), repitition (" * "), indexing ([]), slicing ([:]), and length (len()). A for loop can be used to iterate through the characters of a string, items in a list or lines of a file.

* One way of converting numeric information into string information is to use a string or a list as a lookup table.

* Lists are more general than string:
- Strings are always a sequences of characters, whereas lists can contain values of any type.
- Lists are mutable, which means that items in a list can be modified by assigning new values.

* String are represented in the computer as numeric codes. ASCII and Unicode are comptaible standards that are used for specifying the correspondence between characters and underlying codes. Python provieds the ord and chr functions for translating between Unicode codes and characters.

* Python string and list objects include many useful built-in methods for string and list processing.

* Program input and output often involve string processing. Python provides numerous operators for converting back and forth between numbers and strings. The string formatting method (format) is particularly useful for producing nicely formatted output.

* Text files are multi-line strings stored in secondary memory. A text file may be opened for reading or writing. When opened for writing, the exisiting contents of the file are erased. Python provides three file-reading methods : read(), readline() and readlines(). It is also possible to iterate through the lines of a file with a for loop. Data is written to a file using the print function. When processing is finished, a file should be closed.