# Quiz: Python for Data Analysts

Congratulations on completing the Python for Data Analysts course! We will conduct an assessment quiz to test your practical programming techniques that you have learned on the course. The quiz is expected to be taken in the classroom, please contact our teaching team if you missed the chance to take it in class.

In [None]:
import pandas as pd

## Data Pre-Processing

We will use **Companies dataset**. You can use the data in the csv file extension stored in the `companies.csv` file. Companies dataset is some examples of CRM (customer relationship management) data which contains the following variables:

- `ID` : identifier unique from cutomers name
- `Customer Name` : customer name (company name)
- `Consulting Sales` : price for consulting service
- `Software Sales` : price for software service
- `Forecasted Growth` : percentage of company growth in a certain time 
- `Returns` : funds that return (Returns) within a certain time 
- `Month` : company founding month 
- `Day` : company fonding day
- `Year` : company founding year
- `Location` : location of company
- `Account` : type of company

In [None]:
clients = pd.read_csv("data/companies.csv", index_col=0)
clients.head()

Unlike our previous datasets, `clients` has some formatting inconsistencies by design: The `Returns` column has comma delimiter (`,`) and the currency (`IDR`) whereas related columns use values that has omitted the separator.

Now let's observe its data types:

In [None]:
clients.dtypes

From the results above, we can see some variables are not stored in the right data type format. Can you apply what you have learnt about specifying data type on this new data?

In [None]:
## Your Code Below:


If you tried to directly use the `.astype` function on `Consulting Sales` and `Software Sales`, you will most likely get an error.  To perform arithmetic computations on the numeric columns, we have to drop the 'IDR' currency string and treat these columns as numbers. We'll use pandas built-in `.str.replace()` method for this.

In [None]:
clients['Consulting Sales'].str.replace('IDR','')

To apply the function on multiple columns, we can use `apply` method with `lambda` as below:

In [None]:
clients[['Consulting Sales','Software Sales','Returns']] =\
clients[['Consulting Sales','Software Sales','Returns']].apply(lambda x: x.str.replace('IDR',''))

In [None]:
clients.head()

Go on fill in the blank below to remove the comma (`,`) sign on `Returns`!

In [None]:
## Fill in the blank (___):

_____ = clients['Returns'].str.replace(_____,_____)

---

## Data Analysis

As a Data Analyst, you want to analyze the total sales. To analyze it, create a new column in the DataFrame and name it `Total Sales`. This column is a sum of `Consulting Sales` and `Software Sales`. Use `head` or `tail` to peek at the resulting data frame to confirm that the output matches your expectation. 

---
1. What is the sum of the `Total Sales` column? Tips: Use the `.sum()` method on the columns to accumulate the total value!

      *Berapa total (`sum`) keseluruhan dari `Total Sales`? Tips: Gunakan method `.sum()` pada kolom untuk mengakumulasi nilai totalnya!*
        
    - [ ] 11,470,000
    - [ ] 19,238,903
    - [ ]  7,768,903
---

In [None]:
## Your Code Below:


Based on the total sales obtained each year, you are currently focusing on analyzing sales from each client in 2017. Therefore, for now you will focus on companies that became clients in 2017. Use subsetting methods to get information on sales data that occurred in 2017.

---
2. Which company has the biggest `Total Sales` in 2017?

    *Perusahaan manakah yang mendapatkan Total Sales terbesar di tahun 2017?*
    - [ ] New Media Group
    - [ ] PT. Algoritma Data Indonesia
    - [ ] Palembang Konsultansi
---    

In [None]:
## Your Code Below:


After that, we will return our all available data. Using all available data, it turns out that the company wants to do campaign companies that have a sales value exceeding 1,500,000 IDR. Please use the subsetting again to find out which companies have sales exceeding 1,500,000 IDR. It turns out that there are two companies whose sales value exceeds 1,500,000 IDR in the data.

---
3. Which are the companies have sales exceeding 1,500,000 IDR in the data?  

    *Perusahaan mana saja yang memiliki penjualan lebih dari 1,500,000 IDR?*

    - [ ] Palembang Konsultansi & PT. Surya Citra Manajemen
    - [ ] PT. Surya Citra Manajemen & New Media Group
    - [ ] Palembang Konsultansi & New Media Group
---    

In [None]:
## Your Code Below:


As a Data Analyst, you are asked to analyze the average of `Total Sales` for all companies in the data. In analyzing data to get a data center, there are several measurements that can be used. Among them are mean, median, and mode. Because what will be analyzed is a numeric value, sometimes the numerical value can be affected by extreme values or outliers. Therefore, the use of the **median** is often more relevant than using the mean value, because the mean is more easily affected by outlier values.

---
4. Based on those analysis, what is the central value of `Total Sales`?  

    *Berdasarkan analisa diatas, berapakah nilai pusat data dari `Total Sales`?*
    
    - [ ] 1,354,250
    - [ ] 1,515,875
    - [ ] 3,737,700
---

In [None]:
## Your Code Below:


Now, you are focusing to analyze PT. Algoritma Data Indonesia. Using what you have learned before about subsetting method, please answer this question.

---
5. Which subsetting method is more appropriate, if we want to perform subsetting on `clients` by explicitly stating the `ID`?  

    *Manakah metode subsetting yang paling sesuai untuk melakukan subsetting secara eksplisit pada ID pada data `clients`?*

    - [ ] `clients.loc[57531, :]`
    - [ ] `clients.iloc[57531, : ]`
    - [ ] `clients[57531, : ]` 
    
---

In [None]:
## Your Code Below:


Lastly, subsetting is needed to analyze clients which spesifically located in Jakarta and have Enterprise account. try to fill in the blank code below to perform the right conditional subsetting and please answer this question below.

```
clients[________ _ ________]
```

---
6. Based on the syntax you have completed, which syntax below you will use to complete the right conditional subsetting?

    *Berdasarkan syntax yang Anda lengkapi diatas, manakah pilihan syntax dibawah ini yang tepat untuk melengkapi?*

    - [ ] (clients.Location == "Jakarta") | (clients.Account == "Enterprise")
    - [ ] clients.Location == "Jakarta" & clients.Account == "Enterprise"
    - [ ] (clients.Location == "Jakarta") & (clients.Account == "Enterprise")
    - [ ] clients.Location == "Jakarta" | clients.Account == "Enterprise"
---

In [None]:
## Your Code Below:
