Primitive Data Types
--------------------

These are the basic data types that constitute all of the more complex data structures in python. The basic data types are the following:

* Strings (for text)
* Numeric types (integers and decimals)
* Booleans


### String:

String variables are used to store textual data, characters and sequences of characters. Can be specified by surrounding some text with single `'` or double `"` quotes. 

In [None]:
str_1 = "Hello World!"
print str_1

str_1 = 'Hello World!'
print str_1

In [None]:
# Notice that we end the strings below with the \n character, 
# which is the "new line" special character
str_2 = "Hello World!\n\n\nHello World Twice!"
print str_2


In [None]:
# Let's use the \t character which is the special character for tab
str_3 ="Hello\tWorld!\tWe\tare\tfar\taway\n"
print str_3

In [None]:
print "I want to print backslash: \\"

In [None]:
str_4 = 'This is a string within single quotes that can contain "double quotes" as part of the string\n'
print str_4

In [None]:
str_5 = 'If we want to have \'single quotes\' in single quoted string we should escape them\n'
print str_5

In [None]:
str_6 = "Similarly, if we want to have \"double quotes\" in double quoted string we should escape them\n"
print str_6

In [None]:
str_7 = "hello"
str_8 = "world"
hello_world_message = str_7 + " " + str_8 + "!" 
print hello_world_message # note that + concatenates strings

In [None]:
str_9 = '''
If we want to have multiple lines in the string
then we can use triple quotes: This is a multiline
string!
'''
print str_9
    

In [None]:
# Triple quotes are useful for multiline pieces of text (e.g., newspaper articles)
str_10 = """
(CNN)AirAsia Flight QZ8501 climbed rapidly before it crashed, a top Indonesian official said Tuesday, according to The Jakarta Post.

Then the plane stalled, Transportation Minister Ignasius Jonan said at a parliamentary hearing, according to the AFP and Reuters news agencies.

"The plane, during the last minutes, went up faster than normal speed ... after then, it stalled. That is according to the data from the radar," Jonan said, according to the news agencies.
"""
print str_10

#### "Raw" strings

Prefix strings with `r` to indicate a `raw` string, where there are no escape characters like \t, \n etc. These will be handy when entering regular expressions.

In [None]:
print "e.g., type C:\teaching\ instead of C:\\teaching\\)"

In [None]:
print r"e.g., type C:\teaching\ instead of C:\\teaching\\)"

### Acessing parts of the string

**Note: The following instructions will be re-used later for other data structures (e.g., lists), so pay attention!**

Strings can be indexed (subscripted), with the first character having index 0. There is no separate character type; a character is simply a string of size one:

In [None]:
word = 'Python'

In [None]:
word[0]  # character in position 0

In [None]:
word[1]

In [None]:
word[5]  # character in position 5

Indices may also be negative numbers, to start counting from the right:

In [None]:
word[-1]  # last character

In [None]:
word[-2]  # second-last character


In [None]:
word[-6]

In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain a substring:

In [None]:
word[0:2]  # characters from position 0 (included) to 2 (excluded)

In [None]:
word[2:5]  # characters from position 2 (included) to 5 (excluded)

In [None]:
word[2:]  # characters from position 2 (included) to the end

In [None]:
word[:3]  # characters from beginning (position 0) to position 3 (excluded)

In [None]:
word[-3:] # last three characters

In [None]:
word[-3:-1] # penultimate two charactrs

#### Exercise

* Assign the string 'Dealing with Data' to a Python variable. 
* Print the word 'Dealing' by using the indexing/slicing approach.
* Print the word 'Data' by using the negative indexing/slicing approach.

In [None]:
# your own code here

### Operations on Strings 

We've already seen one of the most common string operators, `+`, used for string concatenation, the indexing operation to get specific characters, and the slicing operation to get substrings. 

Below are some of the more commonly used string operations:

+ `+` : concatenate two strings
+ `len(str)`: length of a string, number of characters
+ `str.upper()`: returns an uppercase version of a string
+ `str.lower()`: returns a lowercase version of a string
+ `haystack.index(needle)`: searches haystack for needle, prints the position of the first occurrence, indexed from 0. Note, throws an error if needle isn't found.
+ `str_1.count(str_2)`: counts the number of occurrences of one string in another.
+ `haystack.startswith(needle)`: does a the haystack string start with the needle string?
+ `haystack.endswith(needle)`: does a the haystack string end with the needle string?
+ `str_1.split(str_2)`: split the first string at every occurrence of the second string. Outputs a list (see below).
+ `==`: are the two operand strings the same?
+ `str.strip()`: remove any whitespace from the left or right of the string, including newlines. 

A better list of string operations is [available here](http://docs.python.org/2/library/string.html).

In [None]:
word = "Python is the word. And on and on and on and on..." 
print len(word)

In [None]:
print "The length of the word above is ", len(word), "characters"

In [None]:
print word.lower()

In [None]:
print word.upper()

In [None]:
word = "Python is the word. And on and on and on and on..." 
ind = word.index("on")
print ind

In [None]:
print "The first time that we see the string on is at position", word.index("on")

In [None]:
first_appearance = word.index("on")
second_appearance = word.index("on",first_appearance+1)
print "The second time that we see the string on is at position", second_appearance

In [None]:
# Looking for the string "on" at the second half of the big string called "word"
midpoint = len(word)/2 # finds the middle of the string word
second_half_appearance = word.index("on",midpoint)
print "First time that we see 'on' in the second half: ", second_half_appearance

In [None]:
# We get an error when the string that we are looking for does not appear
# anywhere: For example, we will look for the string "Panos"
word = "Python is the word. And on and on and on and on..." 
word.index("Panos")

In [None]:
word = "Python is the word. And on and on and on and on..."
lookfor = "PYTHON"
count = word.count(lookfor)
print  "We see the string '", lookfor  ,"' that many times: ",  count

In [None]:
str_1 = "Hello"
str_2 = "World"
print "concatenation:"
print str_1 + " " + str_2
print str_1 + " everybody"

In [None]:
print "length:"
print len(str_1)
print len(str_1 + " " + str_2)

In [None]:
print "string casing:"
print str_1.upper()
print "HELLO".lower()

In [None]:
print "string indexing:"
print "hello".index("ll")
print "hello".upper().index("LL")

In [None]:
print "string count:"
print str_1.count("l")
print str_1.count("ll")

In [None]:
print "starts with & endswith:"
print "hello".startswith("he")
print "hello".endswith("world")

In [None]:
print "split:"
print "practical data science".split(" ")
print "hello".split(" ")
print "practical data science".split("a")

In [None]:
str_1 = "hello"
print "equality:"
print str_1 == "hello"

print str_1 == "Hello"

In [None]:
mystring1 = "practical data science"
mylist1 = mystring1.split(" ")
print mystring1
print mylist1

### String Formatting:

Often one wants to embed other information into strings, sometimes with special formatting constraints. In python, one may insert special formatting characters into strings that convey what type of data should be inserted and where, and how the "stringified" form should be formatted. For instance, one may wish to insert an integer into a string:

In [None]:
message = 2
print "To be or not %d be" % message

Note the `%d` formatting (or conversion) specifier in the string. This is stating that you wish to insert an integer value (more on these conversion specifiers below). Then the value you wish to insert into the string is separated by a `%` character placed after the string. If you wish to insert more than one value into the string being formatted, they can be placed in a comma separated list, surrounded by parentheses after the `%`:

In [None]:
print "%d be or not %d be" % (2, 2)

In detail, a conversion specifier contains two or more characters which must occur in order with the following components:

+ The `%` character which marks the start of the specifier
+ An optional minimum field width. The value being read is padded to be at least this width
+ An optional precision value, given as a "`.`" followed by the number of digits precision. 
+ Conversion specifier flag, which defines the type of the variable that we are printing (specified below). 

In [None]:
print "Result: >%06.3f<" % (100.0/23)

In [None]:
print "Result: >%6.3f<" % (100.0/23)

In [None]:
print "Result: %.2f" % (100.0/23)

In [None]:
print "Result: <%010.4f>" % (10000000.0/7)
print "Result: <%010.4f>" % (100.0/7)

In [None]:
print "Result: >%15.10f<" % (100.0/23000000)

Some common conversion flag characters are:

+ `s`: String
+ `d`: Signed integer decimal.	
+ `f`: Floating point decimal format.
+ `e`: Floating point exponential format (lowercase).
+ `E`: Floating point exponential format (uppercase).

In [None]:
print "%d %s or not %04.3f %s" % (2, "be", 10.0/3, 'b')

For a more detailed treatment on string formatting options, [see here](http://docs.python.org/release/2.5.2/lib/typesseq-strings.html).