### Revenues

Run the cell below to load the revenues for a retail company for 2 seperate years.

In [1]:
import pandas as pd

revenues = pd.DataFrame({
    2015: ['10232431p', '11432812p', '12938450p', '30432209p', '20110887p', '14903945p',
           '16774532p', '14687687p', '30462134p', '39768256p', '58342173p', '80107291p'],
    2016: ['9187174p', '10242983p', '11143096p', '30954299p', '20333143p', '16780122p',
           '18430973p', '15090314p', '31286712p', '41552798p', '60131181p', '84270667p'],
}, index=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])

revenues

Unnamed: 0,2015,2016
Jan,10232431p,9187174p
Feb,11432812p,10242983p
Mar,12938450p,11143096p
Apr,30432209p,30954299p
May,20110887p,20333143p
Jun,14903945p,16780122p
Jul,16774532p,18430973p
Aug,14687687p,15090314p
Sep,30462134p,31286712p
Oct,39768256p,41552798p


Check the data types of each column first:

In [2]:
# Run this cell

revenues.dtypes

2015    object
2016    object
dtype: object

These numbers are given in pence, but the analysis should be carried out in £. 

Write a function to strip off the 'p', and divide by 100.

Map this function onto columns `2015` and `2016`.

In [3]:
def convert_to_pounds(pence_str):
    pence = float(pence_str.rstrip('p'))
    pounds = pence / 100
    return pounds

for column in [2015, 2016]:
    revenues[column] = revenues[column].map(convert_to_pounds)

Now check the dtypes of each column again, to make sure you have floats:

In [4]:
revenues.dtypes

2015    float64
2016    float64
dtype: object

What was the total revenue for 2015?

In [5]:
revenues[2015].sum()

3401928.0700000003

and 2016?

In [6]:
revenues[2016].sum()

3494034.6200000001

Create a new column `"Change"` with the change in revenues between 2015 and 2016.

In [7]:
revenues['Change'] = revenues[2016] - revenues[2015]
revenues

Unnamed: 0,2015,2016,Change
Jan,102324.31,91871.74,-10452.57
Feb,114328.12,102429.83,-11898.29
Mar,129384.5,111430.96,-17953.54
Apr,304322.09,309542.99,5220.9
May,201108.87,203331.43,2222.56
Jun,149039.45,167801.22,18761.77
Jul,167745.32,184309.73,16564.41
Aug,146876.87,150903.14,4026.27
Sep,304621.34,312867.12,8245.78
Oct,397682.56,415527.98,17845.42


Create a new column `"Percentage Change"` with the percentage change in revenues between 2015 and 2016.

In [8]:
revenues['Percentage Change'] = (revenues['Change'] / revenues[2015]) * 100
revenues

Unnamed: 0,2015,2016,Change,Percentage Change
Jan,102324.31,91871.74,-10452.57,-10.215139
Feb,114328.12,102429.83,-11898.29,-10.407142
Mar,129384.5,111430.96,-17953.54,-13.876113
Apr,304322.09,309542.99,5220.9,1.715584
May,201108.87,203331.43,2222.56,1.105153
Jun,149039.45,167801.22,18761.77,12.588459
Jul,167745.32,184309.73,16564.41,9.874737
Aug,146876.87,150903.14,4026.27,2.741255
Sep,304621.34,312867.12,8245.78,2.706895
Oct,397682.56,415527.98,17845.42,4.487353


### Tech Firms (Some More)

Run the cell below to load the tech firms in again

In [9]:
tech_firms = pd.read_json('data/tech_firms.json')
# Shows the first 5 rows
tech_firms.head()

Unnamed: 0,Company,FY,Headquarters,Market cap ($B),Revenue ($B)
0,Apple Inc.,2016,"Cupertino, CA, US",815.39,215.6
1,Amazon.com,2016,"Seattle, WA, US",478.0,135.9
2,Samsung Electronics,2016,"Suwon, South Korea",311.0,173.9
3,Foxconn,2016,"New Taipei City, Taiwan",66.0,135.1
4,Alphabet Inc.,2016,"Mountain View, CA, US",676.0,90.2


Write a function that takes in a string and returns `True` if the string ends with `'US'` and returns `False` otherwise.

In [10]:
def is_america(hq_str):
    if hq_str.endswith('US'):
        return True
    else:
        return False

Map this function onto the column `"Headquarters"`.

In [11]:
tech_firms['Headquarters'].map(is_america)

0      True
1      True
2     False
3     False
4      True
5      True
6     False
7      True
8     False
9     False
10    False
11     True
12     True
13     True
Name: Headquarters, dtype: bool

Use this mapping to select only the companies whose `Headquarters` is in the US.

In [12]:
tech_firms[tech_firms['Headquarters'].map(is_america)]

Unnamed: 0,Company,FY,Headquarters,Market cap ($B),Revenue ($B)
0,Apple Inc.,2016,"Cupertino, CA, US",815.39,215.6
1,Amazon.com,2016,"Seattle, WA, US",478.0,135.9
4,Alphabet Inc.,2016,"Mountain View, CA, US",676.0,90.2
5,Microsoft,2016,"Redmond, WA, US",561.0,85.3
7,IBM,2016,"Armonk, NY, US",145.0,79.9
11,Dell Technologies,2016,"Austin, TX, US",14.0,64.8
12,Intel,2016,"Santa Clara, CA, US",163.0,59.3
13,Hewlett Packard Enterprise,2016,"Palo Alto, CA, US",30.0,50.1


What is the revenue of the companies on the list that are in the US?

In [13]:
billion = 1 * 10 ** 9

tech_firms.loc[tech_firms['Headquarters'].map(is_america), 'Revenue ($B)'].sum() * billion

781100000000.0

You can use a tilde (~) to return the opposite case.

For example:

`df[df['Col1'] > 5]`

Will return all rows for which column `"Col1"` is greater than 5.

However,

`df[~df['Col1'] > 5]`

Will return all rows for which column `"Col1"` is *not* greater then 5.

Using the tilde, what is the revenue of the companies on the list that are not in the US?

In [14]:
tech_firms.loc[~tech_firms['Headquarters'].map(is_america), 'Revenue ($B)'].sum() * billion

609800000000.00012

### Superstore Sales

Tableau provide a sample data set named "Superstore Sales". Load this data set and print the first 5 rows in by running the cell below.

In [15]:
superstore_sales = pd.read_csv('data/superstore_sales.csv')
superstore_sales.head()

Unnamed: 0,Row ID,Order ID,Order Date,Order Priority,Order Quantity,Sales,Discount,Ship Mode,Profit,Unit Price,...,Customer Name,Province,Region,Customer Segment,Product Category,Product Sub-Category,Product Name,Product Container,Product Base Margin,Ship Date
0,1,3,13/10/2010,Low,6,261.54,0.04,Regular Air,-213.25,38.94,...,Muhammed MacIntyre,Nunavut,Nunavut,Small Business,Office Supplies,Storage & Organization,"Eldon Base for stackable storage shelf, platinum",Large Box,0.8,20/10/2010
1,49,293,01/10/2012,High,49,10123.02,0.07,Delivery Truck,457.81,208.16,...,Barry French,Nunavut,Nunavut,Consumer,Office Supplies,Appliances,"1.7 Cubic Foot Compact ""Cube"" Office Refrigera...",Jumbo Drum,0.58,02/10/2012
2,50,293,01/10/2012,High,27,244.57,0.01,Regular Air,46.71,8.69,...,Barry French,Nunavut,Nunavut,Consumer,Office Supplies,Binders and Binder Accessories,"Cardinal Slant-D® Ring Binder, Heavy Gauge Vinyl",Small Box,0.39,03/10/2012
3,80,483,10/07/2011,High,30,4965.7595,0.08,Regular Air,1198.97,195.99,...,Clay Rozendal,Nunavut,Nunavut,Corporate,Technology,Telephones and Communication,R380,Small Box,0.58,12/07/2011
4,85,515,28/08/2010,Not Specified,19,394.27,0.08,Regular Air,30.94,21.78,...,Carlos Soltero,Nunavut,Nunavut,Consumer,Office Supplies,Appliances,Holmes HEPA Air Purifier,Medium Box,0.5,30/08/2010


Use the `.describe()` method to describe the numeric columns

In [16]:
superstore_sales.describe()

Unnamed: 0,Row ID,Order ID,Order Quantity,Sales,Discount,Profit,Unit Price,Shipping Cost,Product Base Margin
count,8399.0,8399.0,8399.0,8399.0,8399.0,8399.0,8399.0,8399.0,8336.0
mean,4200.0,29965.179783,25.571735,1775.878179,0.049671,181.184424,89.346259,12.838557,0.512513
std,2424.726789,17260.883447,14.481071,3585.050525,0.031823,1196.653371,290.354383,17.264052,0.135589
min,1.0,3.0,1.0,2.24,0.0,-14140.7,0.99,0.49,0.35
25%,2100.5,15011.5,13.0,143.195,0.02,-83.315,6.48,3.3,0.38
50%,4200.0,29857.0,26.0,449.42,0.05,-1.5,20.99,6.07,0.52
75%,6299.5,44596.0,38.0,1709.32,0.08,162.75,85.99,13.99,0.59
max,8399.0,59973.0,50.0,89061.05,0.25,27220.69,6783.02,164.73,0.85


Have a look at the summary statistics above and answer the following questions:
* Do any columns have any rows missing (Count less than the other columns)
* What is the mean profit for a row?
* What was the maximum discount (%) given to any customer?
* What is the median price paid for shipping?