
# Why use code

## Let's go!

As you have seen from the **Instant Data Science** lesson, programming is a powerful tool for answering questions about data.  It allows us to collect, clean up and format our data and then perform calculations on that data.  

Much of our digital information is in the form of text.  Song lyrics and emails, for example. To clean up and format that text with Python, we need to become familiar with our first type of data, the string. 

### Objectives
* Working with our first data type, strings
* Learning about string methods that Python provides us
* Learning how to discover new methods

## Working with text

A lot of information in the world is in the form of text. If we want to capture this information and operate on it, we should become familiar with an entire datatype in Python dedicated to it: the string.

**Note:** *If you are viewing this in markdown format (files that end in .md), the first gray box will show the **input** and what follows is the **output**. The rest of the lab(s) will follow this pattern, but we've marked the input for the first few examples with the comment `'# input'`.* 


In [3]:
'Homer Simpson' #input

'Homer Simpson'

When programmers say "string", what they mean is text.  When programmers say datatype, they just mean type of data.  For example, here is another datatype in Python, a number.


In [4]:
5

5

We can discover the type of a piece of data by calling, or executing, the `type` method. 

Let's look at an example below:

In [5]:
type('Data Scientist') #input

str

For example, to initialize a string we cannot simply type letters.  Instead, we need to be very explicit with Python and tell Python it is about to see some text.  We do this by surrounding our text with single or double quotes e.g. "Simpson" and 'Simpson' would mean exactly the same meaning for Python.  If we don't do that, or end our quotation marks too early, Python will throw us an error.

**NOTE: Ask students to try above examples without using quotes and discuss the error. Also, advise students to be consistent with whichever convention they choose** 




## Changing data with built in methods

Python is picky like this for a reason. Once it knows we are working with a string, it gives us specific functionality for operating on strings.  We call this functionality a function, or a method.

For example here is a method that works with a string (text), but does not work with a number.


In [9]:
'Homer Simpson'.upper()


'HOMER SIMPSON'

In [10]:
'Homer Simpson'.lower()

'homer simpson'

**NOTE: Ask students to try using .upper() and .lower() with a number and discuss the results**

Yep.  Bad news bears.

We can operate on a datatype with the following format: 

   * [DATATYPE] [DOT] [METHOD NAME] [PARENTHESES]

**NOTE: Introduce students to .endswith() method and ask them to try out a couple of examples**

In [11]:
"Homer Simpson".endswith('Simpson')

True

In [12]:
"Charles Montgomery Burns".endswith('Simpson')

False

As you can see, by following the format of data-dot-method name-parentheses we can begin to operate on our data.

###  Discovering new methods

You may be starting to worry about there being too many methods to keep track of.  Let's ask Python for help with finding more information about what we can do with strings.


In [14]:
help('strings')

No Python documentation found for 'strings'.
Use help() to get the interactive help utility.
Use help(str) for help on the str class.



The `help()` word with Python comes out of the box with the language and is like an old school Alexa.  Just like an Alexa, it often doesn't understand us.  Let's follow its stern directions, and see what happens when we type in `help(str)`.

In [15]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(...)
 |      S.__format__(format_spec) -> str
 |      
 |      Return a formatted version of S as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getatt

**NOTE: Explain relevant bits from help(), ask students to try some of simple methods from help on strings e.g. .title(), .capitalize() 

That's really it for this lesson on strings, and it's easy to feel a little unsatisfied with just a few methods on the datatype.  What's more important with programming is mechanisms of discovery and experimentation beyond just memorizing a list of features.

In this lesson, we already saw a few of them.  
* Guess: We just tried something and looked to the error message for clues as to what to do next.
* help(str): We saw a nice way to learn about new methods, then we took a guess to test our understanding
* Following a pattern: We started with a simple method like calling upper, took a moment to break this down into a pattern, and then tried this pattern again to call other methods 

Here is one more method of discovery:  just ask Google.  For example, look what happens when we ask Google about capitalization.

![](https://learn-verified.s3.amazonaws.com/data-science-assets/ask-google.png)

[A great link with a detailed answer.](https://stackoverflow.com/a/1549644)

![](https://learn-verified.s3.amazonaws.com/data-science-assets/stack-overflow.png)

Then we try this new method out ourselves, to see if this user on StackOverFlow is right (they normally are).
