# Introduction to String

strings are usually less important in numerical computing than in many other uses of Python such as web-based activities.  None the less, it is often useful to have a minimal understanding of strings, such as how to format numbers to output tables.  

## Imports

In [1]:
from __future__ import print_function
import numpy as np

## Basic string use

There are two basic types of strings, `str` and `unicode`, although _raw_ string, which are just `str`, are often useful for sequences with backslashes (`\ `).  In most Python strings, characters containing a backslash are escapes, so that \n produces a new line, \t a tab, and \\ produces a single backslash.  However, when the backslashes should be treated as backslashes and not escape sequences, using raw strings will make entering them easier.  For example compare the following strings which are identical.

```python
'c:\\temp\\some_dir\\nested_dir\\'
r'c:\temp\some_dit\nested_dir'
```

Unicode strings allow for characters beyond the usual 127 character ASCII set and are used like raw strings, only prefixing a `u`, or by explicitly calling `unicode` as in 

```python
u'A unicode string'
unicode('A unicode string')
```

In [2]:
string = 'This is a string.\nThis will be a new line and\twill be a tab'
raw = r'This is a string.\nThis will be a new line and\twill be a tab'
unicode_string = u'This is a string.\nThis will be a new line and\twill be a tab'

print(string)
print(raw)
print(unicode_string)

This is a string.
This will be a new line and	will be a tab
This is a string.\nThis will be a new line and\twill be a tab
This is a string.
This will be a new line and	will be a tab


Basic string addition and multiplication was introduced in an earlier lesson, and is reproduced below.

In [3]:
print('*' * 20)
print('a' + 'b')

********************
ab


Strings can be joined using another character (or general string) using `join`.

In [4]:
joined = ', '.join(('a','b','c'))
print(joined)

a, b, c


Similarly, `split` can be used to split on a character or general string and `strip` can be used to remove whitespace characters.

In [5]:
split = joined.split(',')
print(split)

stripped = [s.strip() for s in split]
print(stripped)

['a', ' b', ' c']
['a', 'b', 'c']


## Formatting Strings - New Method

The modern method to produce formatted strings from numbers is to use `format` with a string that contains text like `{0}`, `{keyword}` or `{1:`_format_`}` where _format_ is a format string which describes how the number of should be output. 

When used with numbers, e.g. `{0}` or `{1:0.3f}`, the number is used as a positional argument for the inputs to `format`.

The three conversions below show the default output, a format that controls the number of decimal places `0.3f` and one which always uses exponential notation (`0.4e`) with a fixed number of places to the right of the decimal

In [6]:
pi, e, golden = np.pi, np.exp(1), 0.5+np.sqrt(5)/2
'Three important numbers are {0}, {1:0.3f} and {2:0.4e}'.format(pi,e,golden)

'Three important numbers are 3.141592653589793, 2.718 and 1.6180e+00'

The next example is identical only using named arguments, which are used in `format` with keyword arguments.

In [7]:
'Three important numbers are {pi:f}, {e:0.3f} and {gr:0.4e}'.format(pi=pi,e=e,gr=golden)

'Three important numbers are 3.141593, 2.718 and 1.6180e+00'

One of the advantages of "new" style format strings are that they can be used repeatedly without passing an additional value.

In [8]:
base = 'Three important numbers are {e:0.3f}, {golden:0.4e} and {pi:0.13f} (bka {pi:0.2f})'
base.format(pi=np.pi,e=np.exp(1),golden=0.5+np.sqrt(5)/2)

'Three important numbers are 2.718, 1.6180e+00 and 3.1415926535898 (bka 3.14)'

## Formats

`0.`_n_`f` is probably the most useful, where _n_ controlled the number of decimal places shown

In [9]:
'Pi is {0:0.4f}'.format(np.pi)

'Pi is 3.1416'

`g` is general, and is not always identical to `f` since _n_ is now the total number of digits shows.  It is also different for large numbers where it switches to `e`.

In [10]:
'Pi is {0:0.4g}, and when multiplied by a  large number {1:0.4g}'.format(pi, 1000000000*pi)

'Pi is 3.142, and when multiplied by a  large number 3.142e+09'

In [11]:
'100 times Pi is {0:0.4f}'.format(100 * pi)

'100 times Pi is 314.1593'

In [12]:
'100 times Pi is {0:0.4g}'.format(100 * pi)

'100 times Pi is 314.2'

`e` always uses exponential notation.

In [13]:
'Pi is {0:0.3e}'.format(np.pi)

'Pi is 3.142e+00'

`%` can be used to output percents (which are automatically scaled by 100).

In [14]:
'Pi is {0:0.3%} (in percent)'.format(np.pi)

'Pi is 314.159% (in percent)'

The leading digit can be used to control the size of the output, which produces padding when the number is small.

In [15]:
'Pi is {0:20.3f} (lots of spaces!), but fewer here {1:20.3f}'.format(pi, 1000000000000*pi)

'Pi is                3.142 (lots of spaces!), but fewer here    3141592653589.793'

Similarly, `+` can be added to always show a sign.  The default behavior is to only show the sign when negative.  `-` can be added in place of plus to always include a blank space when the number if positive and a minus sign when negative, and is useful for aligning output.

In [16]:
'Pi is {0:+0.3f} (with a sign!)'.format(np.pi)

'Pi is +3.142 (with a sign!)'

## Formatting Strings - Old Method

Python 2.x still allows for old-style formatting using simple format strings which always start with `%` and then contain the format.  These are used with the `%` at the end of the string followed by a tuple with the required values (strictly in order).

In [17]:
'Three important numbers are %f, %0.3f and %0.4e' % (np.pi,np.exp(1),0.5+np.sqrt(5)/2)

'Three important numbers are 3.141593, 2.718 and 1.6180e+00'

## Conversion from strings

String conversion to integers, longs, floats and complex numbers is simple using `int`, `long`, `float` and `complex`

In [19]:
int('32')

32

In [21]:
float('32')

32.0

In [22]:
complex('2+3j')

(2+3j)

Note that trying to convert a non-string results in a ValueError.

In [23]:
float('apple')

ValueError: could not convert string to float: 'apple'

When automatically parsing a string from a file, it is usually a good idea to use `try` and `except` to avoid errors.

Note that trying to convert a non-string results in a ValueError.

In [24]:
#x = 'apple'
x = '3.14+2j'
try:
    int(x)
    print('int')
except:
    try:
        float(x)
        print('float')
    except:
        try:
            complex(x)
            print('complex')
        except:
            print('Not a number')
            

complex
