# Quiz: Python for Data Analysts

Congratulations on completing the Python for Data Analysts course! We will conduct an assessment quiz to test your practical programming techniques that you have learned on the course. The quiz is expected to be taken in the classroom, please contact our teaching team if you missed the chance to take it in class.

In [1]:
import pandas as pd

## Data Pre-Processing

We will use **Companies dataset**. You can use the data in the csv file extension stored in the `companies.csv` file. Companies dataset is some examples of CRM (customer relationship management) data which contains the following variables:

- `ID` : identifier unique from cutomers name
- `Customer Name` : customer name (company name)
- `Consulting Sales` : price for consulting service
- `Software Sales` : price for software service
- `Forecasted Growth` : percentage of company growth in a certain time 
- `Returns` : funds that return (Returns) within a certain time 
- `Month` : company founding month 
- `Day` : company fonding day
- `Year` : company founding year
- `Location` : location of company
- `Account` : type of company

In [2]:
clients = pd.read_csv("data/companies.csv", index_col=0)
clients.head()

Unnamed: 0_level_0,Customer Name,Consulting Sales,Software Sales,Forecasted Growth,Returns,Month,Day,Year,Location,Account
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
30940,New Media Group,IDR7125000,IDR5500000,30.00%,"IDR1,500,000",1,10,2017,Jakarta,Enterprise
82391,Li and Partners,IDR420000,IDR820000,10.00%,"IDR400,000",6,15,2016,Jakarta,Startup
18374,PT. Kreasi Metrik Solusi,0,IDR550403,25.00%,0,3,29,2012,Surabaya,Enterprise
57531,PT. Algoritma Data Indonesia,IDR850000,IDR395500,4.00%,0,7,17,2017,Jakarta,Startup
19002,Palembang Konsultansi,IDR2115000,0,-15.00%,0,2,24,2018,Bandung,Startup


Unlike our previous datasets, `clients` has some formatting inconsistencies by design: The `Returns` column has comma delimiter (`,`) and the currency (`IDR`) whereas related columns use values that has omitted the separator.

Now let's observe its data types:

In [3]:
clients.dtypes

Customer Name        object
Consulting Sales     object
Software Sales       object
Forecasted Growth    object
Returns              object
Month                 int64
Day                   int64
Year                  int64
Location             object
Account              object
dtype: object

From the results above, we can see some variables are not stored in the right data type format. Can you apply what you have learnt about specifying data type on this new data?

If you tried to directly use the `.astype` function on `Consulting Sales` and `Software Sales`, you will most likely get an error.  To perform arithmetic computations on the numeric columns, we have to drop the 'IDR' currency string and treat these columns as numbers. We'll use pandas built-in `.str.replace()` method for this.

In [4]:
clients['Consulting Sales'].str.replace('IDR','')

ID
30940    7125000
82391     420000
18374          0
57531     850000
19002    2115000
31142     960000
Name: Consulting Sales, dtype: object

To apply the function on multiple columns, we can use `apply` method with `lambda` as below:

In [5]:
clients[['Consulting Sales','Software Sales','Returns']] =\
clients[['Consulting Sales','Software Sales','Returns']].apply(lambda x: x.str.replace('IDR',''))

In [6]:
clients.head()

Unnamed: 0_level_0,Customer Name,Consulting Sales,Software Sales,Forecasted Growth,Returns,Month,Day,Year,Location,Account
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
30940,New Media Group,7125000,5500000,30.00%,1500000,1,10,2017,Jakarta,Enterprise
82391,Li and Partners,420000,820000,10.00%,400000,6,15,2016,Jakarta,Startup
18374,PT. Kreasi Metrik Solusi,0,550403,25.00%,0,3,29,2012,Surabaya,Enterprise
57531,PT. Algoritma Data Indonesia,850000,395500,4.00%,0,7,17,2017,Jakarta,Startup
19002,Palembang Konsultansi,2115000,0,-15.00%,0,2,24,2018,Bandung,Startup


Go on fill in the blank below to remove the comma (`,`) sign on `Returns`! \
Example: `clients['Consulting Sales'] = clients['Consulting Sales'].str.replace('IDR','')` is used to remove `IDR` from `Consulting Sales` column

In [8]:
## Fill in the blank (___)

clients['Returns'] = clients['Returns'].str.replace(',','')

In [9]:
## Your Code Below:

clients[['Consulting Sales','Software Sales','Returns']] =\
clients[['Consulting Sales','Software Sales','Returns']].astype('int64')

In [10]:
clients.dtypes

Customer Name        object
Consulting Sales      int64
Software Sales        int64
Forecasted Growth    object
Returns               int64
Month                 int64
Day                   int64
Year                  int64
Location             object
Account              object
dtype: object

---

## Data Analysis

As a Data Analyst, you want to analyze the total sales. To analyze it, create a new column in the DataFrame and name it `Total Sales`. This column is a sum of `Consulting Sales` and `Software Sales`. Use `head` or `tail` to peek at the resulting data frame to confirm that the output matches your expectation. 

---
1. What is the sum of the `Total Sales` column? Tips: Use the `.sum()` method on the columns to accumulate the total value!

      *Berapa total (`sum`) keseluruhan dari `Total Sales`? Tips: Gunakan method `.sum()` pada kolom untuk mengakumulasi nilai totalnya!*
        
    - [ ] 11,470,000
    - [x] 19,238,903
    - [ ]  7,768,903
---

In [11]:
## Your Code Below:

# Declare variables
Software_Sales = clients['Software Sales'].sum()
Consulting_Sales = clients['Consulting Sales'].sum()

# Calculate
total_sales = Software_Sales + Consulting_Sales

# Show result
total_sales

19238903

Based on the total sales obtained each year, you are currently focusing on analyzing sales from each client in 2017. Therefore, for now you will focus on companies that became clients in 2017. Use subsetting methods to get information on sales data that occurred in 2017.

---
2. Which company has the biggest `Total Sales` in 2017?

    *Perusahaan manakah yang mendapatkan Total Sales terbesar di tahun 2017?*
    - [x] New Media Group
    - [ ] PT. Algoritma Data Indonesia
    - [ ] Palembang Konsultansi
---    

In [13]:
# Show spesific data
clients2017 = clients["Year"] == 2017
clients[clients2017]

Unnamed: 0_level_0,Customer Name,Consulting Sales,Software Sales,Forecasted Growth,Returns,Month,Day,Year,Location,Account
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
30940,New Media Group,7125000,5500000,30.00%,1500000,1,10,2017,Jakarta,Enterprise
57531,PT. Algoritma Data Indonesia,850000,395500,4.00%,0,7,17,2017,Jakarta,Startup


In [23]:
## Your Code Below:

# Declare New Media Group variables
nmg1 = clients[clients['Customer Name']=="New Media Group"]['Consulting Sales'].sum()
nmg2 = clients[clients['Customer Name']=="New Media Group"]['Software Sales'].sum()
nmg = nmg1+nmg2

# Declare PT. Algoritma Data Indonesia variables
adi1 = clients[clients['Customer Name']=="PT. Algoritma Data Indonesia"]['Consulting Sales'].sum()
adi2 = clients[clients['Customer Name']=="PT. Algoritma Data Indonesia"]['Software Sales'].sum()
adi = adi1+adi2

# condition check
nmg > adi

True

In [33]:
# show result

print("PT. Algoritma Data Indonesia \t= ",adi,"\nNew Media Group \t\t= ",nmg)

PT. Algoritma Data Indonesia 	=  1245500 
New Media Group 		=  12625000


After that, we will return our all available data. Using all available data, it turns out that the company wants to do campaign companies that have a sales value exceeding 1,500,000 IDR. Please use the subsetting again to find out which companies have sales exceeding 1,500,000 IDR. It turns out that there are two companies whose sales value exceeds 1,500,000 IDR in the data.

---
3. Which are the companies have sales exceeding 1,500,000 IDR in the data?  

    *Perusahaan mana saja yang memiliki penjualan lebih dari 1,500,000 IDR?*

    - [ ] Palembang Konsultansi & PT. Surya Citra Manajemen
    - [ ] PT. Surya Citra Manajemen & New Media Group
    - [x] Palembang Konsultansi & New Media Group
---    

In [34]:
## Your Code Below:

sales = clients['Software Sales'] + clients['Consulting Sales']
kondisi = sales > 1500000

clients[kondisi]

Unnamed: 0_level_0,Customer Name,Consulting Sales,Software Sales,Forecasted Growth,Returns,Month,Day,Year,Location,Account
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
30940,New Media Group,7125000,5500000,30.00%,1500000,1,10,2017,Jakarta,Enterprise
19002,Palembang Konsultansi,2115000,0,-15.00%,0,2,24,2018,Bandung,Startup


As a Data Analyst, you are asked to analyze the average of `Total Sales` for all companies in the data. In analyzing data to get a data center, there are several measurements that can be used. Among them are mean, median, and mode. Because what will be analyzed is a numeric value, sometimes the numerical value can be affected by extreme values or outliers. Therefore, the use of the **median** is often more relevant than using the mean value, because the mean is more easily affected by outlier values.

---
4. Based on those analysis, what is the central value of `Total Sales`?  

    *Berdasarkan analisa diatas, berapakah nilai pusat data dari `Total Sales`?*
    
    - [x] 1,354,250
    - [ ] 1,515,875
    - [ ] 3,737,700
---

In [35]:
## Your Code Below:

sales.quantile()

1354250.0

Now, you are focusing to analyze PT. Algoritma Data Indonesia. Using what you have learned before about subsetting method, please answer this question.

---
5. Which subsetting method is more appropriate, if we want to perform subsetting on `clients` by explicitly stating the `ID`?  

    *Manakah metode subsetting yang paling sesuai untuk melakukan subsetting secara eksplisit pada ID pada data `clients`?*

    - [x] `clients.loc[57531, :]`
    - [ ] `clients.iloc[57531, : ]`
    - [ ] `clients[57531, : ]` 
    
---

In [36]:
## Your Code Below:

clients.loc[57531, : ]

Customer Name        PT. Algoritma Data Indonesia
Consulting Sales                           850000
Software Sales                             395500
Forecasted Growth                           4.00%
Returns                                         0
Month                                           7
Day                                            17
Year                                         2017
Location                                  Jakarta
Account                                   Startup
Name: 57531, dtype: object

Lastly, subsetting is needed to analyze clients which spesifically located in Jakarta and have Enterprise account. try to fill in the blank code below to perform the right conditional subsetting and please answer this question below.

```
clients[________ _ ________]
```

---
6. Based on the syntax you have completed, which syntax below you will use to complete the right conditional subsetting?

    *Berdasarkan syntax yang Anda lengkapi diatas, manakah pilihan syntax dibawah ini yang tepat untuk melengkapi?*

    - [ ] (clients.Location == "Jakarta") | (clients.Account == "Enterprise")
    - [ ] clients.Location == "Jakarta" & clients.Account == "Enterprise"
    - [x] (clients.Location == "Jakarta") & (clients.Account == "Enterprise")
    - [ ] clients.Location == "Jakarta" | clients.Account == "Enterprise"
---

In [37]:
## Your Code Below:

clients[(clients.Location == "Jakarta") & (clients.Account == "Enterprise")]

Unnamed: 0_level_0,Customer Name,Consulting Sales,Software Sales,Forecasted Growth,Returns,Month,Day,Year,Location,Account
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
30940,New Media Group,7125000,5500000,30.00%,1500000,1,10,2017,Jakarta,Enterprise
31142,PT. Surya Citra Manajemen,960000,503000,19.00%,0,1,19,2019,Jakarta,Enterprise
