### Creating Dictionary
We will be given training data, x and y.  
During training, we want data to get into a shape such that finding probablities $P(X^j = x^j/ y =a_{i})$.  
To find best possible class for a data, we need $P(X^j = x^j/ y =a_{i})$ and $P(y = a_{i)}$.  
So we should be able to find both of these probablities quickly.  
What we should do is we -  
1. Create a dictionary corrosponding to different classes   
    $d[a_{1}]$  
    $d[a_{2}]$  
    $d[a_{3}]$    
    lets say we have only these possible classes.  
    first level will have this much classes.  
    
2. Within $d[a_{1}]$ $->$ $[j]$ for each of the features lets say we use feature number, and for each feature we store 
   all possible values this feature can take  for eg, this feature can take $[x_{1}^j]$$[x_{2}^j]$ ... $[x_{n}^j]$,  
   $d[a_{1}]$ $->$ $[j]$ $->$ $[j]$ $->$ $[x_{1}^j]$  
      $[x_{2}^j]$  
                           .  
                           .  
                           .
      $[x_{k}^j]$,  
   So these are the possible values that the jth feature can take.  
3. Then we store the count here in $[x_{1}^j]$ corrosponding to each $x^j$ ,  
   $d[a_{1}]$ $->$ $[j]$ $->$ $[j]$ $->$ $[x_{1}^j]$ $->$ $count$
Once we have that, it will be pretty straight to to calculate $x^j$ because for this, we need this count and we will divide it by Total training data in class $a_{i}$.  
That means $\frac{count}{total train data in class a_{i}}$, where denominator is the count in the dictionary.  
We will have to find $P(y = a_{i})$ as well and this probablity will be total training data in $\frac{total training data in a_{i}}{total training data }$.  

In [1]:
import numpy as np

In [None]:
##1. fit function will create dictionaries and then we write the predict function which takes the dictionary and
## takes the testing data and tells us predictions corrosponding to that testing data.  

def fit(X_train, Y_train):
    ##2. result will be a dictionary. This result will have some keys.  
    result = {}
    
    
    ##2. result will have keys as the distinct(unique) values of Y_train. 
    class_values = set(Y_train)
    
    ##3. Now we have our top level keys. 
    ##3. We are going through all possible classes.  
    for current_class in class_values:
        ##4. result of current_class will again be a dictionary.
        ##4. and this dictionary will again have all possible features.  
        result[current_class] = {}
        
        ## 16. One more thing we will be storing that is -
        result['total_data'] = len(Y_train)
        
        ## 11. What we do here is we find out just the training data which has the class as current_class.
        current_class_rows = (Y_train == current_class)
        ## 11. above will return True or false array.
        
        ## 12. To get current training data, we need to do is 
        X_train_current = X_train[current_class_rows]
        ## 12. We will get only those rows which has current_class_rows == True.
        
        ## 13. Similarly, 
        Y_train_current = Y_train[current_class_rows]
        
        
        ##5. to find all possible features we do this.  
        num_features = X_train.shape[1]
        
        ##6. lets pass thorugh each feature.  
        
        ## 15. Aling with this, we will store total count of training data which belongs to current class.
        result[current_class]['total_count'] = len(Y_train_current)
        
        for j in range(1, num_features + 1):   ## so that we get count of features from 1. 
            
            ##7. so for each of these features we will be creating a new dictionary.  
            result[current_class][j] = {}
            
            ##8. what we want to store in this dict is lets say [a1][2][] for class a1, for feature 2, what all
            ##8. possible value that feature 2 can take, and for each of this value, we want to store its count.  
            ##8. to find, 
            all_possible_values = set(X_train[:, j])
            
            ##9. then we go through all possible value in the set. 
            for current_value in all_possible_values:
                
                ##10. now for each value, we want to find the count.
                ##10. count of for all training data points where the y is current class. 
                ##10. In how many of them do you have jth feature values as current feature value.
                ##10. In a nutshell, we want to look at training data which has class as current_class.
                
                
                ## 14. Now out of those current values, we will take only those which have jth feature = current_value.
                ## 14. X_train_current[:, j] == current_value will give me a True - false numpy array.
                ## 14. (X_train_current[:, j] == current_value).sum() will give us whereever it has True, it will add it as 1
                ## 14. and for false, it will count as 0.so it becomes count value of 1.
                result[current_class][j][current_value] = (X_train_current[:, j] == current_value).sum()
                
                
                
            
             
        
        
    