# The Ultimate - Range Distribution

**Please read this so you don't have to read the errors!**  
There is a function which does all things ar once. So you don't have to screatch your head about maintaining the data and so...  

### The function takes:
```python
classify_data(data, 
              mode= 'in',
              precentage= False,
              cf= False,
              cf_type= 'lthan',
              cf_per= False)
```

Each of the parameters are self explainatory, but you can tweak them as your need.  
Enjoy!

In [1]:
from table import Table
import numpy as np
import math

In [2]:
# Range class created for convinieant
class Range:
    '''This `Range` is different from `range` as you can notice.import
    This class manages the Range to be provided during classification of
    the raw ungrouped data to the grouped data.'''

    
    def __init__(self, lower, upper, mode= 'in'):
        self._lower = lower
        self._upper = upper
        self._mode = mode
        self.freq = 0
        
    def __contains__(self, observation):
        if self._mode == 'in':
            condition = self._lower <= observation < self._upper
        else:
            condition = self._lower <= observation <= self._upper
        if condition:
            return True
        return False

    def __repr__(self):
        return str(f'{self._lower} - {self._upper}')

In [3]:
def classify_data(data, mode= 'in', precentage= False, cf= False, cf_type= 'lthan', cf_per= False):
    '''This function will make the arrangement of the data based on the parameters
    you provide.
    Acceptable parameters:
    
    • data: The 1D data that you provide (required)
    • mode: 'in' for inclusive & 'ex' for exclusive
    • precentage: True or False
    • cf: True or False
    • cf_type: 'lthan' for Less Than & 'gthan' for Greter than - provide only if cf = True
    • cf_per: True or False
    '''

    range_ = max(data) - min(data)
    N = len(data)

    # Number of classes with speager's method
    k = math.ceil(1 + 3.322 * (np.log10(range_)))
    class_width = math.ceil(range_ / k)

    mode = mode.lower()
    if mode not in ['in', 'ex']:
        raise NotImplementedError(f"The mode must be either 'in' or 'ex'. You provided -> {mode}")
    else:
        offset = 1 if mode == 'ex' else 0

    # Range objects
    prev = min(data)
    intervals = []
    for cls in range(k):
        intervals.append(Range(prev, prev + class_width, mode= mode))
        prev += class_width + offset
    
    # Counting freq
    for obs in data:
        for class_ in intervals:
            if obs in class_:
                class_.freq += 1
                break
    
    # Reshaping data to be printed in a tabular format
    reshaped = []
    total = 0
    for entry in intervals: 
        reshaped.append([entry, entry.freq])
        total += entry.freq
    

    ## Handling parameter conditions

    # Percentage
    if precentage == True:
        for record in reshaped:
            prc = round((record[1] * 100) / total, 1)
            record.append(str(prc) + " %")
    
    # A function to find percentage if needed
    def percentage_cf():
        if cf_per:
            per = round((100 * cumulated) / total, 1)
            return [cumulated, str(per) + " %"]
        else:
            return [cumulated]

    # Cumulative Freq
    if cf == True:
        cf_type = cf_type.lower()
        if cf_type not in ['lthan', 'gthan']:
            raise NotImplementedError(f"The cf_type must be either 'lthan' or 'gthan'. You provided -> {cf_type}")
        else:
            if cf_type == 'lthan':
                prev_freq = 0
                for record in reshaped:
                    cumulated = prev_freq + record[1]
                    record.extend(percentage_cf())
                    prev_freq += record[1]
            elif cf_type == 'gthan':
                prev_freq = 0
                total_t = total
                for record in reshaped:
                    cumulated = total_t - prev_freq
                    record.extend(percentage_cf())
                    total_t -= prev_freq
                    prev_freq = record[1]
    return reshaped, total

### 

# That's it!
Let's call the function to make it work. 

In [4]:
# Data
data= np.random.randint(0, 100, 200)

##### Inclusive

In [5]:
cls, total = classify_data(data, mode= 'in')

Table(cls, headers= ['Classes', 'Frequencies']).construct()
print('Total: ', total)

 + ---------- + ------------- + 
 |  Classes   |  Frequencies  | 
 + ---------- + ------------- + 
 |     0 - 13 |            31 | 
 |    13 - 26 |            30 | 
 |    26 - 39 |            30 | 
 |    39 - 52 |            21 | 
 |    52 - 65 |            23 | 
 |    65 - 78 |            20 | 
 |    78 - 91 |            29 | 
 |   91 - 104 |            16 | 
 + ---------- + ------------- + 
Total:  200


##### Exclusive

In [6]:
cls, total = classify_data(data, mode= 'ex')

Table(cls, headers= ['Classes', 'Frequencies']).construct()
print('Total: ', total)

 + ---------- + ------------- + 
 |  Classes   |  Frequencies  | 
 + ---------- + ------------- + 
 |     0 - 13 |            31 | 
 |    14 - 27 |            35 | 
 |    28 - 41 |            31 | 
 |    42 - 55 |            20 | 
 |    56 - 69 |            28 | 
 |    70 - 83 |            22 | 
 |    84 - 97 |            29 | 
 |   98 - 111 |             4 | 
 + ---------- + ------------- + 
Total:  200


##### Other options if you need them

In [7]:
cls, total = classify_data(data, mode= 'ex', precentage= True, cf= True, cf_type= 'lthan', cf_per= True)

Table(cls, headers= ['Classes', 'Frequencies', 'Freq_per', 'Less_than cf', 'cf_per']).construct(rows= False)
print('Total: ', total)

 + ---------- + ------------- + ---------- + -------------- + --------- + 
 |  Classes   |  Frequencies  |  Freq_per  |  Less_than cf  |   cf_per  | 
 + ---------- + ------------- + ---------- + -------------- + --------- + 
 |     0 - 13 |            31 |     15.5 % |             31 |    15.5 % | 
 |    14 - 27 |            35 |     17.5 % |             66 |    33.0 % | 
 |    28 - 41 |            31 |     15.5 % |             97 |    48.5 % | 
 |    42 - 55 |            20 |     10.0 % |            117 |    58.5 % | 
 |    56 - 69 |            28 |     14.0 % |            145 |    72.5 % | 
 |    70 - 83 |            22 |     11.0 % |            167 |    83.5 % | 
 |    84 - 97 |            29 |     14.5 % |            196 |    98.0 % | 
 |   98 - 111 |             4 |      2.0 % |            200 |   100.0 % | 
 + ---------- + ------------- + ---------- + -------------- + --------- + 
Total:  200


###  

###  

## Time Complexity

Have tried the algorithms **WITH ALL PARAMETERS: turend ON**  and we get the following optimized time complexity

##### With 1,000 samples

In [13]:
data= np.random.randint(0, 100, 1000)
%timeit classify_data(data, mode= 'ex', precentage= True, cf= True, cf_type= 'lthan', cf_per= True)

3.53 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


##### With 10,000 samples

In [14]:
data= np.random.randint(0, 100, 10000)
%timeit classify_data(data, mode= 'ex', precentage= True, cf= True, cf_type= 'lthan', cf_per= True)

32.9 ms ± 3.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


##### With 10,00,000 samples

In [15]:
data= np.random.randint(0, 100, 10_00_000)
%timeit classify_data(data, mode= 'ex', precentage= True, cf= True, cf_type= 'lthan', cf_per= True)

3.11 s ± 105 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


###  

You can try out more and with more samples. I came up with this the most optimized version.  
Of course it can be further optimized somewhere... but I found it "works well" for me type.

# Thanks!