<div id="about_dataset">
    <h2>About the dataset</h2>
</div>


Imagine a telecommunications provider has segmented its customer base by service usage patterns, categorizing the customers into four groups. If demographic data can be used to predict group membership, the company can customize offers for individual prospective customers. It is a classification problem. That is, given the dataset,  with predefined labels, we need to build a model to be used to predict class of a new or unknown case. 

The example focuses on using demographic data, such as region, age, and marital, to predict usage patterns. 

The target field, called **custcat**, has four possible values that correspond to the four customer groups, as follows:

  1- Basic Service
  
  2- E-Service
  
  3- Plus Service
  
  4- Total Service
  

Our objective is to build a classifier, to predict the class of unknown cases.

### Import Libraries

In [None]:
#import libraries for mathematical operations and importing dataset
import pandas as pd
import numpy as np

### Import dataset

In [143]:
#import dataset
dataFrame=pd.read_csv('teleCust1000t.csv')
#show first 5 elements of dataset
dataFrame.head()
#get mean and standard  deviation of needful properties in order to use it in gaussian function
mean = dataFrame[['tenure', 'age', 'income', 'employ', 'custcat']].groupby('custcat').mean()
standartDeviation = dataFrame[['tenure', 'age', 'income', 'employ', 'custcat']].groupby('custcat').std()
#convert mean and standard  deviation values to list
meanArray = mean.values.tolist()
stdArray = standartDeviation.values.tolist()

### Train Test Split


In [144]:
from sklearn.model_selection import train_test_split
#get values from dataframe in order to split dataset
X = dataFrame[['tenure','age','income','employ']] .values 
Y = dataFrame['custcat'].values
#split dataset 
X_train, X_test, Y_train, Y_test=train_test_split(X, Y, test_size=0.2, random_state=42)

#get unique numbers from Ytrain and their counts
uniqueNumbers, count =np.unique(Y_train, return_counts=True)
#concatenate lists in one dictionary with uniqie class numbers and their counts
custcastDic= dict(zip(uniqueNumbers, count))

#print dictionary
custcastDic


{1: 209, 2: 175, 3: 219, 4: 197}

### Classification and Probability Calculation

In [151]:
#gaussian distribution function
def gaussFunc(x, std, mean):
    return 1/(std * np.sqrt(2 * np.pi)) * np.exp( - (x - mean)**2 / (2 * std**2) ) 

#calculate probability with formula
def calculateProbability(X_test, custcastDict):
    maxProbValue = 0
    index = 0
    #iterate over 4 classes
    for i in range(0,4):
        result_1 = gaussFunc(X_test[0] ,stdArray[i][0], meanArray[i][0]) * gaussFunc(X_test[1] ,stdArray[i][1], meanArray[i][1]) * gaussFunc(X_test[2] ,stdArray[i][2], meanArray[i][2]) * gaussFunc(X_test[3] ,stdArray[i][3], meanArray[i][3])
        
        result =(custcastDic[i+1] /800) * result_1 #custcast dict has 800 property
        #find maximum probability value
        if result > maxProbValue:
            index = i
            maxProbValue = result
    return index + 1

#finally calculate probability and accuracy value
def result(X_test, Y_test):
    total = 0
    for i in range(200):
        v = calculateProbability(X_test[i], custcastDic)
        print("\nX test value : " ,X_test[i])
        print("Predicted value : "  , v )
        print("Real test value:  ", Y_test[i])
        
        #calculate accuracy value
        if v == Y_test[i]:
            total += 1
            print("Prediction is successfull")
        elif v != Y_test[i]:
            print("Prediction is unsuccessfull")
        print("\n")
        
        
    print("----------------------------------\n")
    print("      Accuracy value: " + str(total/2))
    print("\n----------------------------------")

result(X_test, Y_test)


X test value :  [33. 33. 42.  7.]
Predicted value :  1
Real test value:   2
Prediction is unsuccessfull



X test value :  [ 5. 31. 34.  9.]
Predicted value :  1
Real test value:   3
Prediction is unsuccessfull



X test value :  [55. 45. 36.  9.]
Predicted value :  2
Real test value:   3
Prediction is unsuccessfull



X test value :  [49. 37. 50. 12.]
Predicted value :  1
Real test value:   2
Prediction is unsuccessfull



X test value :  [66. 55. 80. 24.]
Predicted value :  3
Real test value:   3
Prediction is successfull



X test value :  [19. 29. 18.  2.]
Predicted value :  1
Real test value:   1
Prediction is successfull



X test value :  [ 4. 37. 41.  8.]
Predicted value :  1
Real test value:   1
Prediction is successfull



X test value :  [55. 44. 68. 15.]
Predicted value :  2
Real test value:   3
Prediction is unsuccessfull



X test value :  [34. 23. 24.  7.]
Predicted value :  1
Real test value:   3
Prediction is unsuccessfull



X test value :  [20. 25. 33.  0.]
Predicte