<p align="center">
<img src="https://github.com/adelnehme/introduction-to-python/blob/master/assets/hsbc_datacamp.png?raw=True" width="65%">
</p>

<br>

## **Introduction to Python Learning Session**



#### **Learning Objectives**

- Understand the value of Python on the path to developing data fluency
- Import data into Python using `pandas` — Python’s most popular data analysis package
- Filter, add new columns, and analyze datasets using pandas
- Present data visualizations using `Matplotlib` and `Seaborn` — Python’s most popular data visualization
packages
- A discussion on the long term benefits and use cases for data work with Python

#### **The Dataset**

The dataset to be used in this session is a CSV file named `telco_churn.csv`, which contains data on telecom customers churning and some of their key behaviors. It contains the following columns:


- `customerID`: Unique identifier of a customer.
- `gender`: Gender of customer.
- `SeniorCitizen`: Binary variable indicating if customer is senior citizen.
- `Partner`: Binary variable if customer has a partner.
- `tenure`: Number of weeks as a customer.
- `State`: State customer is in
- `PhoneService`: Whether customer has phone service.
- `MultipleLines`: Whether customer has multiple lines.
- `InternetService`: What type of internet service customer has (`"DSL"`, `"Fiber optic"`, `"No"`).
- `OnlineSecurity`: Whether customer has online security service.
- `OnlineBackup`: Whether customer has online backup service.
- `DeviceProtection`: Whether customer has device protection service.
- `TechSupport`: Whether customer has tech support service.
- `StreamingTV`: Whether customer has TV streaming service.
- `StreamingMovies`: Whether customer has movies streaming service.
- `PaymentMethod`: Payment method.
- `MonthlyCharges`: Amount of monthly charges in $.
- `TotalCharges`: Amount of total charges so far.
- `Churn`: Whether customer `'Stayed'` or `'Churned'`.


#### **Questions to answer**

- **Question 1:** What are the number of churners vs non-churners?
- **Question 2:** What is the breakdown of tenure distirbution by churners and non-churners?
- **Question 3:** What is the average monthly charge for customers in New York?
- **Question 4:** What is monthly charge distribution for customers in New York by churn status?

## **Getting started**

In [None]:
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

*To import a CSV file into* `pandas` , *we use*  `data = pd.read_csv(file_path)` *check out this [documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html) for importing other data types*


In [None]:
# Read in the dataset
telco = pd.read_csv('https://github.com/adelnehme/introduction-to-python/blob/master/data/telco_churn.csv?raw=true')

*Some common methods needed to get a better understanding of your DataFrames:*

- `data.head()` *prints the first 5 rows*
- `data.describe()` *prints the distribution of numeric columns*
- `data.info()` *prints the missingness and data types of columns*

In [None]:
# Inspect header of dataset


In [None]:
# Check description of dataset


In [None]:
# Check information of dataset




---

<center><h1> Q&A 1</h1> </center>

---





## **Data Cleaning**

##### **Task 1:** Remove `$` from `MonthlyCharges` and convert it to `float`

_To remove a character from a string column, we can use:_

```
data['column_name'] = data['column_name'].str.strip(character)
```

_To convert a numeric column to a float, we can use:_

```
data['column_name'] = data['column_name'].astype('float')
```


In [None]:
# Strip $ from MonthlyCharges column


# Convert MonthlyCharges column to float

# Print header again


##### **Task 2:** Split `State` into 2 columns

_To split a column into 2 columns around `,` — we can use:_

```
split_data = data['column_name'].str.split(',', expand = True)
```


In [None]:
# Split State column into two




In [None]:
# Replace updated State and create Country column




##### **Task 3:** Drop rows with `CustomerID` missing values

_To count and drop missing values in a DataFrame, you can use the following:_

- `data.isna().sum()` _to count missing values_
- `data.dropna(subset = [column_names])` _drops missing values from specific columns_

In [None]:
# Drop missing values from customerID in Telco


# Count missing values again


##### **Task 4:** Replace missing values in `OnlineSecurity` with `'No'`

_To fill missing values in a specific column in a DataFrame, you can use the following:_

```
data['column_name] = data['column_name].fillna('value') 
```

In [None]:
# Replace missing values from OnlineSecurity in Telco with 'No'


# Count missing values again




---

<center><h1> Q&A 2</h1> </center>

---





## **Data Analysis and Visualization**

**Question 1:** What are the number of churners vs non-churners?

_A convenient way of counting differet values in a column using_ `pandas` _is by using:_

```
data['column_name`].value_counts()
```

_Note that this method works on **columns** only_

In [None]:
# Count the number of churners vs non-churners


In [None]:
# Count the percentage of churners vs non-churners


_To easily show the count of observations in each category using_ `seaborn` _we can use:_

- `sns.countplot(x = , data = )`
  - `x`: _categorical column name on x-axis_
  - `data`: _data being used_
- `plt.title()`:  _sets plot title_
- `plt.xlabel()`: _sets x-axis label_
- `plt.ylabel()`: _sets y-axis label_
- `plt.show()`:  _shows plot_

In [None]:
# Count the amount of churners vs non-churners


# Set titles and labels of plot

# Show plot




---

<center><h1> Q&A 3</h1> </center>

---





**Question 2:** What is the breakdown of tenure distirbution by churners and non-churners?

<p align="center">
<img src="https://github.com/adelnehme/introduction-to-python/blob/master/assets/boxplot.png?raw=true" alt = "boxplot" width="65%">
</p>

_To create a boxplot using_ `seaborn` _we can use:_

- `sns.boxplot(x = , y = , data = )`
  - `x`: _column name on x-axis_
  - `y`: _column name on y-axis_
  - `data`: _data being used_
- `plt.title()`: _sets plot title_
- `plt.xlabel()`: _sets x-axis label_
- `plt.ylabel()`: _sets y-axis label_
- `plt.show()`: _shows plot_

In [None]:
# Set figure size for easy viewing


# Create plot

# Set titles and labels of plot

# Show plot




---

<center><h1> Q&A 4</h1> </center>

---





**Question 3:** What is the average monthly charge for customers in New York?


*To filter a DataFrame, we can use the* `.loc[]` *method which takes in a row, and a column conditon as such:*

```
data.loc[row condition, column condition]
```

*We can isolate based on row conditions only by simply*:

```
data.loc[row condition]
```

In [None]:
# Isolate values of telco where State is New York



**Question 4:** What is monthly charge distribution for customers in New York by churn status?

In [None]:
# Create a telco new york DataFrame


# Show output


_To create a boxplot using_ `seaborn` _we can use:_

- `sns.boxplot(x = , y = , data = )`
  - `x`: _column name on x-axis_
  - `y`: _column name on y-axis_
  - `data`: _data being used_
- `plt.title()`: _sets plot title_
- `plt.xlabel()`: _sets x-axis label_
- `plt.ylabel()`: _sets y-axis label_
- `plt.show()`: _shows plot_

In [None]:
# Visualize the results


# Customize output




---

<center><h1> Q&A 5</h1> </center>

---



