# Python Strings

It is time for you to learn about one of most basic fundamental data types in Python, the string. What is a string? Formally, **a string is an ordered sequence of characters**. Two key words here, **ordered** and **characters.** Ordered means that we will be able to use *indexing* and *slicing* to grab elements from the string. Characters hints at the fact that strings are more flexible than just using the alphabet, we'll see they can also hold other types of characters.

## Creating strings.

In [1]:
# Comment. Won't show up when you run a script.

In [2]:
# Single or double quotes are okay.
'hello'

'hello'

In [3]:
"hello"

'hello'

In [5]:
# Keep in mind potential errors
'I'm not a spy!

SyntaxError: invalid syntax (<ipython-input-5-8f8229029dc3>, line 2)

In [None]:
# Use another set of quotes to capture that inside single quote
" I'm not a spy! "

## Basic Printing of Strings

In the jupyter notebook, a single string in a cell is automatically returned back. However, this is different than printing a string. Printing a string allows us to have multiple outputs. Let's see some useful examples:

In [6]:
'one'
'two'

'two'

In [7]:
print('one')
print('two')

one
two


In [8]:
print("Special Codes")
print('this is a new line \n notice how this prints')
print('this is a tab \t notice how this prints')
print("Notice how they both follow the general escape character of backslash ")

Special Codes
this is a new line 
 notice how this prints
this is a tab 	 notice how this prints
Notice how they both follow the general escape character of backslash 


## Indexing and Slicing

Recall that strings are *ordered sequences* of characters in order. This means we can "grab" single characters (indexing) or grab sub-sections of the string (slicing).

### Indexing

Indexing starts a 0, so the string hello:

    character:    p   y   t   h   o   n 
    index:        0   1   2   3   4   5
    
You can use square brackets to grab single characters

In [9]:
word = "hello"

In [10]:
word[0]

'h'

In [11]:
word[3]

'l'

Python also supports reverse indexing:

    character:        p    y    t    h    o    n
    index:            0    1    2    3    4    5
    reverse index:   -6   -5   -4   -3   -2   -1    
    
Reverse indexing is used commonly to grab the last "chunk" of a sequence.

In [12]:
word[-1]

'o'

### Slicing

We can grab entire subsections of a string with *slice* notation.

This is the notation:

    [start:stop:step]

Key things to note:

1. The starting index direclty corresponds to where your slice will start
2. The stop index corresponds to where you slice will go up to. **It does not include this index character!**
3. The step size is how many characters you skip as you go grab the next one.

Let's see some examples

In [13]:
alpha = 'abcdef'

In [14]:
# NOTICE HOW d IS NOT INCLUDED!
alpha[0:3]

'abc'

In [15]:
alpha[0:4]

'abcd'

In [16]:
alpha[2:4]

'cd'

In [17]:
alpha[2:]

'cdef'

In [18]:
alpha[:2]

'ab'

In [19]:
alpha[0:6:2]

'ace'

## Basic String Methods

Methods are actions you can call off an object usually in the form .method_name() notice the closed parenthesis at the end. Strings have many, many methods which you can check with the Tab functionality in jupyter notebooks, let's go over some of the more useful ones!

In [20]:
basic = "Hello world I am learning python"

In [21]:
basic.upper()

'HELLO WORLD I AM LEARNING PYTHON'

In [22]:
basic.lower()

'hello world i am learning python'

In [23]:
# Preview, we'll learn about lists later on!
basic.split()

['Hello', 'world', 'I', 'am', 'learning', 'python']

In [24]:
basic.split('I')

['Hello world ', ' am learning python']

# Print Formatting

You can use the .format() method off a string, to perform what is formally known as **string interpolation**, essentially inserting variables when printing a string. **Synopsis: ** {value:width.precision}

In [25]:
user_name = "Learner"

In [26]:
print("Welcome {}".format(user_name))

Welcome Learner


In [27]:
action = 'run'

In [28]:
print("The {} needs to {}".format(user_name,action))

The Learner needs to run


In [29]:
print("The {a} needs to {b}".format(a=user_name,b=action))

The Learner needs to run


In [30]:
print("The {b} needs to {a}".format(a=user_name,b=action))

The run needs to Learner


### Formatting Numbers

In [31]:
num = 123.6789
print("The code is: {}".format(num))

The code is: 123.6789


In [32]:
#This rounds of num to 1 digit after decimal
print("The code is: {:.1f}".format(num))

The code is: 123.7


In [33]:
print("The code is: {:.2f}".format(num))

The code is: 123.68


In [34]:
print("The code is: {:.3f}".format(num))

The code is: 123.679


In [35]:
print("The code is: {:.4f}".format(num))

The code is: 123.6789


##### NEXT TOPIC : Lists..

## Advanced Extras

These topics are advance for a beginner but they are necessary. So just go through this and don't try hard to learn this today only. We will cover this in future.Just skim once.

### Question : 
We are provided a large string:

    '''
    
    Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English.
    
    '''

Our goal is to produce a new string, letters, that contains only the alphabetic characters
of the original string (e.g., with spaces, numbers, and punctuation removed).

In [36]:
document = '''Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English.'''

### Answer

In [37]:
letters = ''

for character in document:
    if character.isalpha():
        letters += character

print(letters)

LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincetheswhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookItisalongestablishedfactthatareaderwillbedistractedbythereadablecontentofapagewhenlookingatitslayoutThepointofusingLoremIpsumisthatithasamoreorlessnormaldistributionoflettersasopposedtousingContentherecontentheremakingitlooklikereadableEnglish


#### Read Carefully:
Although this method seems very easy and proper but this is a terrible solution for such a problem.
Since strings are immutable hence they are re-initialised everytime. The command, letters += character, would
presumably compute the concatenation, letters + character, as a new string instance and
then reassign the identifier, letters, to that result. Constructing that new string
would require time proportional to its length. If the final result has n characters, the
series of concatenations would take time proportional to the familiar sum 1+ 2+
3+···+n, and therefore O(n2) time

#### Improvement: 

In [38]:
temp_str = [ ] # start with empty list

for character in document:
    if character.isalpha( ):
        temp_str.append(character) # append alphabetic character
    
letters = ''.join(temp_str)
print(letters)

LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincetheswhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookItisalongestablishedfactthatareaderwillbedistractedbythereadablecontentofapagewhenlookingatitslayoutThepointofusingLoremIpsumisthatithasamoreorlessnormaldistributionoflettersasopposedtousingContentherecontentheremakingitlooklikereadableEnglish


Now we took help of a list which is mutable.In further notebooks we will see how lists are used.This approach is guaranteed to run in O(n) time.

#### Further Improvement:

Here we use list comprehension which we will see in future notebooks.With help of list comprehension we totally remove the need of storing a list in memory.

In [39]:
letters = "".join([char for char in document if char.isalpha()])
#print(letters)

Above was the python one liner magic we will soon see them in further notebooks. Remember just go through it
without banging your head on numerous google searches.
P.S : Smart people had already opened tabs for "join","list" in python.

#### Best Solution

Here we use a generator which is basically a simple function which works only on invocation . Again same line 
"we will see in future notebooks" .

In [40]:
letters = "".join((char for char in document if char.isalpha()))
print(letters)

LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincetheswhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookItisalongestablishedfactthatareaderwillbedistractedbythereadablecontentofapagewhenlookingatitslayoutThepointofusingLoremIpsumisthatithasamoreorlessnormaldistributionoflettersasopposedtousingContentherecontentheremakingitlooklikereadableEnglish


Excellent work ! Keep it up.