# Questions
1. Geography and Name columns need to renamed to Country and Full_Name respectively
2. Find the number of unique Country from which Customers purchase? 
3. What are the Countries?
4. Which Country has most customers?
5. What is the percentage of Female customers?
6. Which age group has the highest number of customers? - Choose 7 Groups

# Basic Operations in Pandas DataFrames (part2)

In [1]:
#Import Packages
import pandas as pd

In [22]:
#Import Dataset
customers = pd.read_csv('Retail_Data_Customers.csv')

## Geography and Name columns need to renamed to Country and Full_Name respectively
**Concepts covered:**
1. Change column names using (.rename())
2. Change column names by reassigning dataframe
3. Renaming more than 2 columns

### Change column names using (.rename())

In [14]:
#Let's check what columns exist
customers.columns

Index(['customer_id', 'Name', 'Geography', 'Gender', 'Age'], dtype='object')

In [15]:
#Let's change the column name Geography
customers.rename(columns={'Geography' : 'Country'})

Unnamed: 0,customer_id,Name,Country,Gender,Age
0,CS1100,Hargrave K,France,Female,42.0
1,CS1101,Hill V,Spain,Female,
2,CS1102,Onio B,France,Female,42.0
3,CS1103,Boni H,France,Female,39.0
4,CS1104,Mitchell W,Spain,Female,43.0
...,...,...,...,...,...
6896,CS8996,Liu J,France,Male,42.0
6897,CS8997,Sokolova P,France,Female,48.0
6898,CS8998,Bancroft A,Spain,Male,41.0
6899,CS8999,Nnanna G,France,Male,66.0


In [16]:
#Has the change happened? - No, because we did not mention the (inplace = True) argument
customers.columns

Index(['customer_id', 'Name', 'Geography', 'Gender', 'Age'], dtype='object')

In [17]:
#We can either use the 'inplace' argument or reassign the outcome to the same dataframe
customers.rename(columns={'Geography' : 'Country'}, inplace = True)

In [18]:
#Now the change is permanent
customers.columns

Index(['customer_id', 'Name', 'Country', 'Gender', 'Age'], dtype='object')

### Change column names by reassigning dataframe

In [23]:
#Now lets try to change the column using the reassign method
customers = customers.rename(columns={'Name' : 'Full_Name'})

In [26]:
#Let's look at the change
customers.columns

Index(['customer_id', 'Full_Name', 'Country', 'Gender', 'Age'], dtype='object')

### Renaming more than 2 columns

In [25]:
#Ofcourse we can do both of them together in the same code by both methods
customers.rename(columns={'Geography' : 'Country', 'Name' : 'Full_Name'}, inplace = True)

## Find the number of unique Country from which Customers purchase?
**Concepts covered:**
1. Getting the count of unique values in a column (.nunique())

### Getting the count of unique values in a column (.nunique())

In [29]:
#This concept is very similar to getting (Select count(*) distinct) using SQL
customers['Country'].nunique()

3

## What are the Countries?
**Concepts covered:**
1. Extracting list of unique values using (.unique())

### Extracting list of unique values using (.unique())

In [30]:
#This concept is very similar to getting (Select distinct) using SQL
customers['Country'].unique()

array(['France', 'Spain', 'Germany'], dtype=object)

## Which Country has most customers?
**Concepts covered:**
1. Getting number of occurances of Unique values (.value_count())

### Getting number of occurances of Unique values

In [36]:
#Lets find the number of customers in each Country
customers['Country'].value_counts()

France     3466
Spain      1724
Germany    1711
Name: Country, dtype: int64

## What is the percentage of Female customers?
**Concepts covered:**
1. Using .value_counts() to find ratio

In [52]:
#Lets find the number of Customers by Gender
gender_count = customers['Gender'].value_counts()
gender_count

Male      3739
Female    3162
Name: Gender, dtype: int64

In [53]:
#Now we can just divide count of Female customers by Total customers
gender_count['Female']/gender_count.sum()

0.4581944645703521

## Which age group has the highest number of customers? - Choose 7 Groups
**Concepts covered:**
1. Getting number of occurances by binning numerical variables using (bins = xx) argument

### Getting number of occurances by binning numerical variables using (bins = xx) argument

In [40]:
#Lets find the number of customers in each Country
customers['Age'].value_counts(bins=7)

(28.571, 39.143]                3197
(39.143, 49.714]                1775
(17.924999999999997, 28.571]     900
(49.714, 60.286]                 642
(60.286, 70.857]                 227
(70.857, 81.429]                  82
(81.429, 92.0]                     7
Name: Age, dtype: int64

# END
**Pandas Concepts Covered:**
1. Renaming Columns
2. (.nunique()) to get count of unique values in a column
3. (.unique()) to get the unique values in a column
4. (.value_counts()) to get the occurances of unique values in a column
5. Using (.values_counts()) to do binning