# Data Visualization Notebook

## Objectives

*   Answer business requirement 1: 
    * As a customer I am interested to understand the patterns from my customer base, so I can better manage churn levels.


## Inputs

* outputs/datasets/collection/TelcoCustomerChurn.csv

## Outputs

* generate code that answers business requirement 1 and can be used to build Streamlit App


## Additional Comments | Insights | Conclusions




---

# Install Packages

In [None]:
! pip install pandas-profiling==2.11.0
! pip install plotly==4.14.0
! pip install feature-engine==1.0.2

# Code for restarting the runtime, that will restart colab session
# It is a good practice after you install a package in a colab session
import os
os.kill(os.getpid(), 9)

# Setup GPU

* Go to Edit → Notebook Settings
* In the Hardware accelerator menu, selects GPU
* note: when you select an option, either GPU, TPU or None, you switch among kernels/sessions

---
* How to know if I am using the GPU?
  * run the code below, if the output is different than '0' or null/nothing, you are using GPU in this session
  * Typically the output will be /device:GPU:0


In [None]:
import tensorflow as tf
tf.test.gpu_device_name()

# **Connection between: Colab Session and your GitHub Repo**

### Insert your **credentials**

* The variable's content will exist only while the session exists. Once this session terminates, the variable's content will be erased permanently.

In [1]:
from getpass import getpass
import os
from IPython.display import clear_output 

print("=== Insert your credentials === \nType in and hit Enter")
os.environ['UserName'] = getpass('GitHub User Name: ')
os.environ['UserEmail'] = getpass('GitHub User E-mail: ')
os.environ['RepoName'] = getpass('GitHub Repository Name: ')
os.environ['UserPwd'] = getpass('GitHub Account Password: ')
clear_output()
print("* Thanks for inserting your credentials!")
print(f"* You may now Clone your Repo to this Session, "
      f"then Connect this Session to your Repo.")

* Thanks for inserting your credentials!
* You may now Clone your Repo to this Session, then Connect this Session to your Repo.


---

### **Clone** your GitHub Repo to your current Colab session

* So you can have access to your project's files

In [2]:
! git clone https://github.com/{os.environ['UserName']}/{os.environ['RepoName']}.git
! rm -rf sample_data   # remove content/sample_data folder, since we dont need it for this project

import os
if os.path.isdir(os.environ['RepoName']):
  print("\n")
  %cd /content/{os.environ['RepoName']}
  print(f"\n\n* Current session directory is:{os.getcwd()}")
  print(f"* You may refresh the session folder to access {os.environ['RepoName']} folder.")
else:
  print(f"\n* The Repo {os.environ['UserName']}/{os.environ['RepoName']} was not cloned."
        f" Please check your Credentials: UserName and RepoName")

Cloning into 'WalkthroughProject01'...
remote: Enumerating objects: 27655, done.[K
remote: Counting objects: 100% (87/87), done.[K
remote: Compressing objects: 100% (60/60), done.[K
remote: Total 27655 (delta 18), reused 57 (delta 14), pack-reused 27568[K
Receiving objects: 100% (27655/27655), 332.07 MiB | 35.86 MiB/s, done.
Resolving deltas: 100% (19/19), done.
Checking out files: 100% (55150/55150), done.


/content/WalkthroughProject01


* Current session directory is:/content/WalkthroughProject01
* You may refresh the session folder to access WalkthroughProject01 folder.


---

### **Connect** this Colab session to your GitHub Repo

* So if you need, you can push files generated in this session to your Repo.

In [None]:
! git config --global user.email {os.environ['UserEmail']}
! git config --global user.name {os.environ['UserName']}
! git remote rm origin
! git remote add origin https://{os.environ['UserName']}:{os.environ['UserPwd']}@github.com/{os.environ['UserName']}/{os.environ['RepoName']}.git

# the logic is: create a temporary file in the sessions, update the repo. Delete this file, update the repo
# If it works, it is a signed that the session is connected to the repo.
import uuid
file_name = "session_connection_test_" + str(uuid.uuid4()) # generates a unique file name
with open(f"{file_name}.txt", "w") as file: file.write("text")
print("=== Testing Session Connectivity to the Repo === \n")
! git add . ; ! git commit -m {file_name + "_added_file"} ; ! git push origin main 
print("\n\n")
os.remove(f"{file_name}.txt")
! git add . ; ! git commit -m {file_name + "_removed_file"}; ! git push origin main

# delete your Credentials (username and password)
os.environ['UserName'] = os.environ['UserPwd'] = os.environ['UserEmail'] = ""

* If output above indicates there was a **failure in the authentication**, please insert again your credentials.

---

### **Push** generated/new files from this Session to GitHub repo

* Git status

In [None]:
! git status

* Git commit

In [None]:
CommitMsg = "added-cleaned-data"
!git add .
!git commit -m {CommitMsg}

* Git Push

In [None]:
!git push origin main


---

# Load Data

In [5]:
import os
import shutil
import random
import joblib

my_data_dir = '/content/WalkthroughProject01/inputs/data'
# gets classes labels
labels = os.listdir(my_data_dir) # it should get only the folder name


# create train, test folders with classess labels sub-folder
for folder in ['train','test']:
  for label in labels:
    os.makedirs(name=my_data_dir+'/'+folder + '/' + label)


In [17]:
labels= ['infected', 'uninfected']
labels

['infected', 'uninfected']

In [63]:

for label in labels:

  files = os.listdir(my_data_dir + '/' + label)
  random.shuffle(files)

  train_set_ratio = 0.7
  train_set_files = int(len(files) * train_set_ratio)

  count = 1
  for file_name in  files:
    if count <= train_set_files:
      shutil.move(my_data_dir + '/' + label + '/' + file_name,
                  my_data_dir + '/train/' + label + '/' + file_name)

    else:
      shutil.move(my_data_dir + '/' + label + '/' + file_name,
              my_data_dir + '/test/' +label + '/'+ file_name)
    count += 1
  


5

In [None]:

my_list = os.listdir(my_data_dir+'/'+x)


# on each original class folder
  #  get filenames, shuffle
  # take the XX% to train for related class
  # take the other % to test for related class



shutil.move("/content/WalkthroughProject01/inputs/datasets/cell_images/cell_images/Parasitized/C100P61ThinF_IMG_20150918_144104_cell_162.png",
            "/content/WalkthroughProject01/inputs/datasets/cell_images/cell_images/C100P61ThinF_IMG_20150918_144104_cell_162.png")


In [None]:
# load data
# split train test set

# create model
#fit model, use tensorboard,with hyperparameter opitimization
# evaluate

In [None]:
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.image import imread

In [None]:
my_data_dir = '/content/WalkthroughProject01/inputs/datasets/cell_images/cell_images'
target_classes = os.listdir(my_data_dir)
target_classes

In [None]:
train_path = my_data_dir+'\\train\\'????

In [None]:
for x in target_classes:
  print(f"* There are {len(os.listdir(my_data_dir+'/'+x))} images in class {x}.")

In [None]:
para_img = imread(my_data_dir+ '/'+ target_classes[1]+ '/'+ os.listdir(my_data_dir+'/'+target_classes[1])[0])
print(para_img.shape)
plt.imshow(para_img)
plt.show()

image sizes

In [None]:
my_data_dir+ '/'+ target_classes[1] + '/'+ image_filename # os.listdir(my_data_dir+'/'+target_classes[1])[0])

In [None]:
dim1,dim2 = [], []
for label in target_classes:
  for image_filename in os.listdir(my_data_dir+ '/'+ label):
    try:
      img = imread(my_data_dir+ '/'+ label + '/'+ image_filename)
      d1, d2, colors = img.shape
      dim1.append(d1)
      dim2.append(d2)
    except Exception as e:
      print(e)


In [None]:
fig, axes = plt.subplots()
sns.scatterplot(x=dim1, y=dim2, alpha=0.2)
dim1_mean = np.array(dim1).mean()
dim2_mean = np.array(dim2).mean()
axes.axvline(x=dim1_mean,color='#D1349C', linestyle='-')
axes.axhline(y=dim2_mean,color='#D1349C', linestyle='-')
plt.text(x=np.array(dim1).min(), y=np.array(dim2).max()*0.9,
         s=f"Pixels lenght average: {round(dim1_mean,0)}\nPixel width average: {round(dim2_mean,)}", c='r')
plt.show()

In [None]:
image_shape = (133,132,3)

---

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
image_gen = ImageDataGenerator(rotation_range=20, # rotate the image 20 degrees
                               width_shift_range=0.10, # Shift the pic width by a max of 5%
                               height_shift_range=0.10, # Shift the pic height by a max of 5%
                               rescale=1/255, # Rescale the image by normalzing it.
                               shear_range=0.1, # Shear means cutting away part of the image (max 10%)
                               zoom_range=0.1, # Zoom in by 10% max
                               horizontal_flip=True, # Allo horizontal flipping
                               fill_mode='nearest', # Fill in missing pixels with the nearest filled value
                               validation_split=0.2,
                               
                              )

In [None]:
plt.imshow(para_img)

In [None]:
plt.imshow(image_gen.random_transform(para_img))

In [None]:
# image_gen.flow_from_directory(my_data_dir)

---

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D
model = Sequential()

model.add(Conv2D(filters=32, kernel_size=(3,3),input_shape=image_shape, activation='relu',))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(3,3),input_shape=image_shape, activation='relu',))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(3,3),input_shape=image_shape, activation='relu',))
model.add(MaxPooling2D(pool_size=(2, 2)))


model.add(Flatten())


model.add(Dense(128))
model.add(Activation('relu'))

# Dropouts help reduce overfitting by randomly turning neurons off during training.
# Here we say randomly turn off 50% of neurons.
model.add(Dropout(0.5))

# Last layer, remember its binary so we use sigmoid
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss',patience=2)

---

In [None]:
batch_size = 16
train_image_gen = image_gen.flow_from_directory(my_data_dir,
                                               target_size=image_shape[:2],
                                                color_mode='rgb',
                                               batch_size=batch_size,
                                               class_mode='binary')

In [None]:
train_image_gen.class_indices

In [None]:
train_image_gen.