# Exercise 0 - Python basics
_By Abigail Hayes._

This exercise concerns basics for handling text and numeric data in Python. We will cover the following:
- Handling strings
- Working with text data in Pandas data frames
- Numpy arrays

There will then be the opportunity to try further variants yourself.

## Setup

First we will set up our environment.

### Environment

If these Python packages are not yet available in your environment, then you should install them.
- numpy - for working with data
- pandas - for data analysis

In [1]:
import pandas as pd
import numpy as np

## Working with strings

Within Python, text is referred to as a `string`. We can work with this as an object.

In [2]:
example = 'This is a very important text string.'

print(example)

This is a very important text string.


In [3]:
example + example

'This is a very important text string.This is a very important text string.'

In [4]:
example[:10]

'This is a '

In [5]:
example.startswith('Hello')

False

In [6]:
example2 = example.replace('text string', 'announcement')

print(example)
print(example2)

This is a very important text string.
This is a very important announcement.


## Task 1

Now have a go yourselves. 

Can you use Python string methods to:
- Check whether my_number contains only numbers?
- Replace 'YOURNAME' in my_name with your name?

In [7]:
my_number = '09709709'


In [8]:
my_name = 'Hello! My name is YOURNAME.'


## Text in Pandas data frames

Generally we won't be working with just one string at a time. Often, we will have a whole dataframe full of different strings.

Here we have a toy data frame to practice on:

In [9]:
data = {
    'subject': ['Maths', 'Science', 'History', 'English', 'Art'],
    'feedback': [
        'Challenging but fun once you understand the concepts.',
        'Exciting experiments make it very engaging.',
        'Lots of dates to memorise, but the stories are interesting.',
        'Improved my writing skills significantly.',
        'Allows creative expression and I feel free.'
    ],
    'difficulty_level': ['High', 'Medium', 'Medium', 'Low', 'Low']
}

df = pd.DataFrame(data)

print(df)

   subject                                           feedback difficulty_level
0    Maths  Challenging but fun once you understand the co...             High
1  Science        Exciting experiments make it very engaging.           Medium
2  History  Lots of dates to memorise, but the stories are...           Medium
3  English          Improved my writing skills significantly.              Low
4      Art        Allows creative expression and I feel free.              Low


We can use the same methods on a whole column.

In [10]:
df['feedback']

0    Challenging but fun once you understand the co...
1          Exciting experiments make it very engaging.
2    Lots of dates to memorise, but the stories are...
3            Improved my writing skills significantly.
4          Allows creative expression and I feel free.
Name: feedback, dtype: object

In [11]:
df['feedback'].str.startswith('L')

0    False
1    False
2     True
3    False
4    False
Name: feedback, dtype: bool

## Task 2

Can you add another column containing only the first word of the feedback? Call it 'headline'.

## Numpy arrays

Now we will look at Numpy arrays. They are generally used for numeric data, but can also handle other types too.

In [12]:
x = np.array([[1,2,3], [2,4,6], [7,9,13]])

print(x)
print(x.shape)

[[ 1  2  3]
 [ 2  4  6]
 [ 7  9 13]]
(3, 3)


In [13]:
x/100

array([[0.01, 0.02, 0.03],
       [0.02, 0.04, 0.06],
       [0.07, 0.09, 0.13]])

In [14]:
x+x

array([[ 2,  4,  6],
       [ 4,  8, 12],
       [14, 18, 26]])

In [15]:
x[0,2]

3

## Task 3

Can you:
- Slice `x` to only have the last column?
- Check which values in `x` are larger than 10?
- Replace 13 with 14?
- Get the transpose of the array?