#  <span style="color:blue">Introduction to Hashing</span>



## Hashing

Suppose we want to design a system for storing employee records with phone numbers(as keys). And we want the following queries to be performed efficiently: 

- Insert a phone number and corresponding information.
- Search a phone number and fetch the information.
- Delete a phone number and related information.

### On an Average The time complexity id O(1) for above operation

We can think of using the following data structures to maintain information about different phone numbers. 

- Array of phone numbers and records.
- Linked List of phone numbers and records.
- Balanced binary search tree with phone numbers as keys.
- Direct Access Table.


For arrays and linked lists, we need to search in a linear fashion, which can be costly in practice. If we use arrays and keep the data sorted, then a phone number can be searched in O(Logn) time using Binary Search, but insert and delete operations become costly as we have to maintain sorted order. 

### Basic Operations:

- HashTable: This operation is used in order to create a new hash table.
- Delete: This operation is used in order to delete a particular key-value pair from the hash table.
- Get: This operation is used in order to search a key inside the hash table and return the value that is associated with that key.
- Put: This operation is used in order to insert a new key-value pair inside the hash table.
- DeleteHashTable: This operation is used in order to delete the hash table


## Hashing Application

### Applications of Hashing

Hashing provides constant time search, insert and delete operations on average. This is why hashing is one of the most used data structure, example problems are, distinct elements, counting frequencies of items, finding duplicates, etc. 

### Introduction:

- Database indexing: Hashing is used to index and retrieve data efficiently in databases and other data storage systems.

- Password storage: Hashing is used to store passwords securely by applying a hash function to the password and storing the hashed result, rather than the plain text password.

- Data compression: Hashing is used in data compression algorithms, such as the Huffman coding algorithm, to encode data efficiently.

- Search algorithms: Hashing is used to implement search algorithms, such as hash tables and bloom filters, for fast lookups and queries.

- Cryptography: Hashing is used in cryptography to generate digital signatures, message authentication codes (MACs), and key derivation functions.

- Load balancing: Hashing is used in load-balancing algorithms, such as consistent hashing, to distribute requests to servers in a network.

- Blockchain: Hashing is used in blockchain technology, such as the proof-of-work algorithm, to secure the integrity and consensus of the blockchain.

- Image processing: Hashing is used in image processing applications, such as perceptual hashing, to detect and prevent image duplicates and modifications.

- File comparison: Hashing is used in file comparison algorithms, such as the MD5 and SHA-1 hash functions, to compare and verify the integrity of files.

- Fraud detection: Hashing is used in fraud detection and cybersecurity applications, such as intrusion detection and antivirus software, to detect and prevent malicious activities.

#### There are many other applications of hashing, including modern day cryptography hash functions. Some of these applications are listed below: 

- Message Digest: Creating a unique fingerprint of data to ensure its integrity and detect changes.
- Password Verification: Securely storing passwords as hashes instead of plain text.
- Data Structures(Programming Languages): Implementing hash tables for efficient lookups, insertion, and deletion of data.
- Compiler Operation: Using hashing techniques for symbol tables and faster code optimization.
- Rabin-Karp Algorithm: Efficient pattern searching within text.
- Cryptography: Hashing forms a basics of modern cryptography.
- Caches: Caches are temporary storage areas that speed up data access. Hashing helps caches work efficiently.


#### Advantages of Applications of Hashing

- Efficiency: Hashing allows for fast lookups, searches, and retrievals of data, with an average time complexity of O(1) for hash table lookups.

- Dynamic: Hashing is a dynamic data structure that can be easily resized, making it suitable for growing and changing datasets.

- Secure: Hashing provides a secure method for storing and retrieving sensitive information, such as passwords, as the original data is transformed into a hash value that is difficult to reverse.

- Simple: Hashing is a simple and straightforward concept, making it easy to implement and understand.

- Scalable: Hashing can be scaled to handle large amounts of data, making it suitable for big data applications.

- Uniqueness: Hashing ensures the uniqueness of data, as two different inputs will result in two different hash values, avoiding collisions.

- Verification: Hashing can be used for data verification, such as checking the integrity of files, as even a small change in the input data will result in a different hash value.

- Space-efficient: Hashing is a space-efficient method for storing and retrieving data, as it only stores the hash values, reducing the amount of memory required.

- Error detection: Hashing can be used for error detection, as it can detect errors in data transmission, storage, or processing.

- Speed: Hashing is a fast and efficient method for processing data, making it suitable for real-time and high-performance applications.

## Direct Address Table

### Direct Address Table

Direct Address Table is a data structure that has the capability of mapping records to their corresponding keys using arrays. In direct address tables, records are placed using their key values directly as indexes. They facilitate fast searching, insertion and deletion operations. 

We can understand the concept using the following example. We create an array of size equal to maximum value plus one (assuming 0 based index) and then use values as indexes. For example, in the following diagram key 21 is used directly as index. 

 
![image.png](attachment:image.png)


#### Advantages:

-  Searching in O(1) Time: Direct address tables use arrays which are random access data structure, so, the key values (which are also the index of the array) can be easily used to search the records in O(1) time.
-  Insertion in O(1) Time: We can easily insert an element in an array in O(1) time. The same thing follows in a direct address table also.
-  Deletion in O(1) Time: Deletion of an element takes O(1) time in an array. Similarly, to delete an element in a direct address table we need O(1) time.
 
#### Limitations:

-  Prior knowledge of maximum key value
-  Practically useful only if the maximum value is very less.
- It causes wastage of memory space if there is a significant difference between total records and maximum value.
- Hashing can overcome these limitations of direct address tables. 

#### How to handle collisions? 
Collisions can be handled like Hashing. We can either use Chaining or open addressing to handle collisions. The only difference from hashing here is, we do not use a hash function to find the index. We rather directly use values as indexes.



##  <span style="color:blue">Introduction to Hashing</span>

Hashing is a technique or process of mapping keys, and values into the hash table by using a hash function. It is done for faster access to elements. The efficiency of mapping depends on the efficiency of the hash function used.

Let a hash function H(x) maps the value x at the index x%10 in an Array. For example if the list of values is [11,12,13,14,15] it will be stored at positions {1,2,3,4,5} in the array or Hash table respectively.

 

![image.png](attachment:image.png)

There are many hash functions that use numeric or alphanumeric keys. This article focuses on discussing different hash functions:

- Division Method.

- Mid Square Method.

- Folding Method.

- Multiplication Method.

Let’s begin discussing these methods in detail.

#### 1. Division Method:

This is the most simple and easiest method to generate a hash value. The hash function divides the value k by M and then uses the remainder obtained.

Formula:

h(K) = k mod M

Here,

k is the key value, and

M is the size of the hash table.

It is best suited that M is a prime number as that can make sure the keys are more uniformly distributed. The hash function is dependent upon the remainder of a division.

Example:

k = 12345

M = 95

h(12345) = 12345 mod 95

= 90

k = 1276

M = 11

h(1276) = 1276 mod 11

= 0

Pros:

This method is quite good for any value of M.

The division method is very fast since it requires only a single division operation.

Cons:

This method leads to poor performance since consecutive keys map to consecutive hash values in the hash table.

Sometimes extra care should be taken to choose the value of M.

#### 2. Mid Square Method:

The mid-square method is a very good hashing method. It involves two steps to compute the hash value-

Square the value of the key k i.e. k2

Extract the middle r digits as the hash value.

Formula:

h(K) = h(k x k)

Here,

k is the key value.

The value of r can be decided based on the size of the table.

Example:

Suppose the hash table has 100 memory locations. So r = 2 because two digits are required to map the key to the memory location.

k = 60

k x k = 60 x 60

= 3600

h(60) = 60

The hash value obtained is 60

Pros:

The performance of this method is good as most or all digits of the key value contribute to the result. This is because all digits in the key contribute to generating the middle digits of the squared result.

The result is not dominated by the distribution of the top digit or bottom digit of the original key value.

Cons:

The size of the key is one of the limitations of this method, as the key is of big size then its square will double the number of digits.

Another disadvantage is that there will be collisions but we can try to reduce collisions.

#### 3. Digit Folding Method:

This method involves two steps:

Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part has the same number of digits except for the last part that can have lesser digits than the other parts.

Add the individual parts. The hash value is obtained by ignoring the last carry if any.

Formula:

k = k1, k2, k3, k4, ….., kn

s = k1+ k2 + k3 + k4 +….+ kn

h(K)= s

Here,s is obtained by adding the parts of the key k

Example:

k = 12345

k1 = 12, k2 = 34, k3 = 5

s = k1 + k2 + k3

= 12 + 34 + 5

= 51

h(K) = 51

Note:

The number of digits in each part varies depending upon the size of the hash table. Suppose for example the size of the hash table is 100, then each part must have two digits except for the last part which can have a lesser number of digits.

#### 4. Multiplication Method

This method involves the following steps:

Choose a constant value A such that 0 < A < 1.

Multiply the key value with A.

Extract the fractional part of kA.

Multiply the result of the above step by the size of the hash table i.e. M.

The resulting hash value is obtained by taking the floor of the result obtained in step 4.

#### Formula:

#### h(K) = floor (M (kA mod 1))

Here,

M is the size of the hash table.

k is the key value.

A is a constant value.

Example:

k = 12345

A = 0.357840

M = 100

h(12345) = floor[ 100 (12345*0.357840 mod 1)]

= floor[ 100 (4417.5348 mod 1) ]

= floor[ 100 (0.5348) ]

= floor[ 53.48 ]

= 53

Pros:

The advantage of the multiplication method is that it can work with any value between 0 and 1, although there are some values that tend to give better results than the rest.

Cons:

The multiplication method is generally suitable when the table size is the power of two, then the whole process of computing the index by the key using multiplication hashing is very fast.



## Collision Handling

### Collision Handling: 
Since a hash function gets us a small number for a big key, there is possibility that two keys result in same value. The situation where a newly inserted key maps to an already occupied slot in hash table is called collision and must be handled using some collision handling technique. Following are the ways to handle collisions: 

### Chaining:
The idea is to make each cell of hash table point to a linked list of records that have same hash function value. Chaining is simple, but requires additional memory outside the table.

### Open Addressing: 
In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NIL. When searching for an element, we examine the table slots one by one until the desired element is found or it is clear that the element is not in the table.

### There are three types of chaining which we will discuss ahead in detail:


#### 1) Separate Chaining

Each bucket in the hash table holds a linked list or other data structure to handle collisions.

Colliding elements are stored in these data structures at the same index.

Offers flexibility and simplicity but may consume more memory due to additional data structures.

#### 2) Open Addressing

All elements are stored directly within the hash table, without using external data structures.

When a collision occurs, the algorithm probes for an alternative empty slot in the table.

Avoids memory overhead but can suffer from clustering and degraded performance.

#### 3) Double Hashing

A probing technique used in open addressing to resolve collisions.

Utilizes a secondary hash function to calculate the interval between probes.

Helps distribute elements more evenly, reducing clustering and improving performance compared to simpler probing techniques.



 



##  <span style="color:blue">Implementation of Chaining in Python</span>

In [1]:
class MyHash:
    def __init__(self, b):
        self.BUCKET = b
        self.table = [[] for x in range(b)]

    def insert(self, x):
        i = x % self.BUCKET
        self.table[i].append(x)

    def remove(self, x):
        i = x % self.BUCKET
        if x in self.table[i]:
            self.table[i].remove(x)

    def search(self, x):
        i = x % self.BUCKET
        return x in self.table[i]


h = MyHash(7)
h.insert(70)
h.insert(71)
h.insert(9)
h.insert(56)
h.insert(72)
print(h.search(56))
h.remove(56)
print(h.search(56))
h.remove(56)


True
False


![image.png](attachment:image.png)

##  <span style="color:blue">Open Addressing</span>

### Open Addressing:
Like separate chaining, open addressing is a method for handling collisions. In Open Addressing, all elements are stored in the hash table itself. So at any point, the size of the table must be greater than or equal to the total number of keys (Note that we can increase table size by copying old data if needed). This approach is also known as closed hashing. This entire procedure is based upon probing. Below are the types of probing:

#### 1. Linear Probing: 
In linear probing, the hash table is searched sequentially that starts from the original location of the hash. If in case the location that we get is already occupied, then we check for the next location. 

 

**The function used for rehashing is as follows: rehash(key) = (n+1)%table-size.** 

#### Quadratic Probing 
If you observe carefully, then you will understand that the interval between probes will increase proportionally to the hash value. Quadratic probing is a method with the help of which we can solve the problem of clustering that was discussed above.  This method is also known as the mid-square method. In this method, we look for the i2‘th slot in the ith iteration. We always start from the original hash location. If only the location is occupied then we check the other slots.

let hash(x) be the slot index computed using hash function.  

If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
…………………………………………..
…………………………………………..



Example: Let us consider a simple hash function as “key mod 5” and a sequence of keys that are to be inserted are 50, 70, 76, 85, 93. 

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

##  <span style="color:blue">Double Hashing</span>

The intervals that lie between probes are computed by another hash function. Double hashing is a technique that reduces clustering in an optimized way. In this technique, the increments for the probing sequence are computed by using another hash function. We use another hash function hash2(x) and look for the **i*hash2(x)** slot in the ith rotation. 

let hash(x) be the slot index computed using hash function.  

If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
…………………………………………..
…………………………………………..

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

##  <span style="color:blue">Frequencies of Array Elements</span>

Given an array which may contain duplicates, print all elements and their frequencies.

#### Examples: 

#### Input :  arr[] = {10, 20, 20, 10, 10, 20, 5, 20}
#### Output : 10 3
         20 4
         5  1

#### Input : arr[] = {10, 20, 20}
#### Output : 10 1
         20 2 
 

A **simple solution** is to run two loops. For every item count number of times, it occurs. To avoid duplicate printing, keep track of processed items. 

**Implementation:**

In [2]:
# Python 3 program to count frequencies
# of array items
def countFreq(arr, n):
    
    # Mark all array elements as not visited
    visited = [False for i in range(n)]

    # Traverse through array elements 
    # and count frequencies
    for i in range(n):
        
        # Skip this element if already 
        # processed
        if (visited[i] == True):
            continue

        # Count frequency
        count = 1
        for j in range(i + 1, n, 1):
            if (arr[i] == arr[j]):
                visited[j] = True
                count += 1
        
        print(arr[i], count)

# Driver Code
if __name__ == '__main__':
    arr = [10, 20, 20, 10, 10, 20, 5, 20]
    n = len(arr)
    countFreq(arr, n)

10 3
20 4
5 1


#### Complexity Analysis:

- Time Complexity : O(n2) 
- Auxiliary Space : O(n)

An efficient solution is to use hashing.

#### Implementation:

In [3]:
# Python3 program to count frequencies 
# of array items
def countFreq(arr, n):

    mp = dict()

    # Traverse through array elements 
    # and count frequencies
    for i in range(n):
        if arr[i] in mp.keys():
            mp[arr[i]] += 1
        else:
            mp[arr[i]] = 1
            
    # Traverse through map and print 
    # frequencies
    for x in mp:
        print(x, " ", mp[x])

# Driver code
arr = [10, 20, 20, 10, 10, 20, 5, 20 ]
n = len(arr)
countFreq(arr, n)


10   3
20   4
5   1


#### Complexity Analysis:

- Time Complexity : O(n) 
- Auxiliary Space : O(n)


#### In the above efficient solution, how to print elements in same order as they appear in input?

In [4]:
# Python3 program to count frequencies of array items
def countFreq(arr, n):
    
    mp = {}
    
    # Traverse through array elements and
    # count frequencies
    for i in range(n):
        if arr[i] not in mp:
            mp[arr[i]] = 0
        mp[arr[i]] += 1
        
    # To print elements according to first
    # occurrence, traverse array one more time
    # print frequencies of elements and mark
    # frequencies as -1 so that same element
    # is not printed multiple times.
    for i in range(n):
        if (mp[arr[i]] != -1):
            print(arr[i],mp[arr[i]])
        mp[arr[i]] = -1

# Driver code

arr = [10, 20, 20, 10, 10, 20, 5, 20]
n = len(arr)
countFreq(arr, n)


10 3
20 4
5 1


#### Complexity Analysis:

- Time Complexity : O(n) 
- Auxiliary Space : O(n)

##  <span style="color:blue">Implementation of Open Addressing in Python</span>

In [5]:
   class MyHash:
    def __init__(self, c):
        self.cap = c
        self.table = [-1] * c
        self.size = 0

    def hash(self, x):
        return x % self.cap

    def search(self, x):
        h = self.hash(x)
        t = self.table
        i = h
        while t[i] != -1:
            if t[i] == x:
                return True
            i = (i + 1) % self.cap
            if i == h:
                return False
        return False

    def insert(self, x):
        if self.size == self.cap:
            return False

        if self.search(x) == True:
            return False
        i = self.hash(x)
        t = self.table
        while t[i] not in (-1, -2):
            i = (i + 1) % self.cap

        t[i] = x
        self.size = self.size + 1
        return True

    def remove(self, x):
        h = self.hash(x)
        t = self.table
        i = h
        while t[i] != -1:
            if t[i] == x:
                t[i] = -2
                return True
            i = (i + 1) % self.cap
            if i == h:
                return False
        return False


h = MyHash(7)
h.insert(70)
h.insert(71)
h.insert(9)
h.insert(56)
h.insert(72)
print(h.search(56))
h.remove(56)
print(h.search(56))
h.remove(56)


True
False


False

### Time Complexity:

#### 1) On average case, time complexity will be O(1).

#### 2) On worst case, time complexity will be O(n) because of the cluster formed.

### Space Complexity

####  The space complexity of the hash table is O(n)

##  <span style="color:blue">Chaining vs Open Addressing</span>

![image.png](attachment:image.png)

##  <span style="color:blue">Set in Python</span>

A Set is an unordered collection data type that is iterable, mutable and has no duplicate elements. 

Set are represented by { } (values enclosed in curly braces)

The major advantage of using a set, as opposed to a list, is that it has a highly optimized method for checking whether a specific element is contained in the set. This is based on a data structure known as a hash table. Since sets are unordered, we cannot access items using indexes like we do in lists

### Methods for Sets


####  Adding elements to Python Sets
Insertion in set is done through set.add() function, where an appropriate record value is created to store in the hash table. Same as checking for an item, i.e., O(1) on average. However, in worst case it can become O(n).

####  Union operation on Python Sets
Two sets can be merged using union() function or | operator. Both Hash Table values are accessed and traversed with merge operation perform on them to combine the elements, at the same time duplicates are removed. The Time Complexity of this is O(len(s1) + len(s2)) where s1 and s2 are two sets whose union needs to be done.

####  Intersection operation on Python Sets
This can be done through intersection() or & operator. Common Elements are selected. They are similar to iteration over the Hash lists and combining the same values on both the Table. Time Complexity of this is O(min(len(s1), len(s2)) where s1 and s2 are two sets whose union needs to be done.

 

####  Finding Difference of Sets in Python
To find difference in between sets. Similar to find difference in linked list. This is done through difference() or – operator. Time complexity of finding difference s1 – s2 is O(len(s1))

 

####  Clearing Python Sets
Set Clear() method empties the whole set inplace.

###  Set Creation in Python:



In [7]:
s1 = {10, 20, 30}

print(s1)

s2 = set([20, 30, 40])

print(s2)

s3 = {}

print('expected type set',type(s3))

s4 = set()

print(type(s4))

print(s4)


{10, 20, 30}
{40, 20, 30}
expected type set <class 'dict'>
<class 'set'>
set()


### Insertion

The Python set add() method adds a given element to a set if the element is not present in the set in Python. 

## Adding a new element to a set
It is used to add a new element to the set if it is not existing in a set.

 

In [8]:
s = {10, 20}

s.add(30)

print(s)

s.add(30)  # adding duplicate items
print(s)

s.update([40, 50])

print(s)

s.update([60, 70], [80, 90])  # inserting multiple list

print(s)


{10, 20, 30}
{10, 20, 30}
{40, 10, 50, 20, 30}
{70, 40, 10, 80, 50, 20, 90, 60, 30}


### Removal of element from set

In [10]:
s = {10, 30, 20, 40}

s.discard(30)

print(s)

s.remove(20)

print(s)

s.clear()

print(s)

s.add(50)

del s


{40, 10, 20}
{40, 10}
set()


###  Operation on two set 1



In [11]:
s1 = {2, 4, 6, 8}

s2 = {3, 6, 9}

print('union ', s1 | s2)
print(s1.union(s2))

print('intersecton', s1 & s2)
print(s1.intersection(s2))

print('present in s1, but not present in s2', s1 - s2)
print(s1.difference(s2))

print('symmetric differences, not present in both', s1 ^ s2)


union  {2, 3, 4, 6, 8, 9}
{2, 3, 4, 6, 8, 9}
intersecton {6}
{6}
present in s1, but not present in s2 {8, 2, 4}
{8, 2, 4}
symmetric differences, not present in both {2, 3, 4, 8, 9}


 ### Operation on two set 2

In [12]:
s1 = {2, 4, 6, 8}
s2 = {4, 8}

print('disjoint sets:', s1.isdisjoint(s2))

print('isSubset:', s1 <= s2)
print(s1.issubset(s2))

print('proper set:', s1 < s2)

print('s1 is superset of s2:', s1 >= s2)
print(s1.issuperset(s2))

print('s1 is proper superset of s2:', s1 > s2)


disjoint sets: False
isSubset: False
False
proper set: False
s1 is superset of s2: True
True
s1 is proper superset of s2: True


##  <span style="color:blue">Dictionary in Python

</span>

### Python Dictionary

Dictionary in Python is a collection of keys values, used to store data values like a map, which, unlike other data types which hold only a single value as an element.

### Example of Dictionary in Python 
Dictionary holds key:value pair. Key-Value is provided in the dictionary to make it more optimized. 

### Creating a Dictionary
In Python, a dictionary can be created by placing a sequence of elements within curly {} braces, separated by ‘comma’. Dictionary holds pairs of values, one being the Key and the other corresponding pair element being its Key:value. Values in a dictionary can be of any data type and can be duplicated, whereas keys can’t be repeated and must be immutable. 

Note – Dictionary keys are case sensitive, the same name but different cases of Key will be treated distinctly. 



In [13]:
d = {110: 'abc', 101: 'xyz', 105: 'pqr'}

print(d)

d = {}
d['laptop'] = 40000
d['mobile'] = 15000
d['earphone'] = 1000

print(d)

print(d['mobile'])


{110: 'abc', 101: 'xyz', 105: 'pqr'}
{'laptop': 40000, 'mobile': 15000, 'earphone': 1000}
15000


### Accessing Key value pairs:

In [14]:
d = {110: 'abc', 101: 'xyz', 105: 'pqr'}

print(d.get(101))

print(d.get(125))

print(d.get(125, "NA"))

if 125 in d:
    print(d[125])
else:
    print("NA")


xyz
None
NA
NA


### Removal of element in dictionary

In [15]:
d = {110: 'abc', 101: 'xyz', 105: 'pqr', 106: 'bcd'}

d[101] = 'wxy'

print(len(d))

print(d)

print('returning and removing 105', d.pop(105))

print('after removing 105', d)

del d[106]

print(d)

d[108] = 'cde'

print('returning and removing last inserted', d.popitem())


4
{110: 'abc', 101: 'wxy', 105: 'pqr', 106: 'bcd'}
returning and removing 105 pqr
after removing 105 {110: 'abc', 101: 'wxy', 106: 'bcd'}
{110: 'abc', 101: 'wxy'}
returning and removing last inserted (108, 'cde')


### Time Complexity : To insert, search and delete the elements in dictionary:

#### 1) Average case: O(1) (The hash function ideally maps the key to a unique, empty bucket, allowing for direct insertion).

#### 2) Worst Case: O(n) (This occurs if there are many hash collisions, and the dictionary needs to resize itself or traverse a long chain within a bucket).

##  <span style="color:blue">Count distinct Elements in a List



</span>

### Count distinct elements in an array

Given an unsorted array arr[] of length N, The task is to count all distinct elements in arr[].

Examples: 

 

Input :   arr[] = {10, 20, 20, 10, 30, 10}
Output : 3
Explanation: There are three distinct elements 10, 20, and 30.

Input :   arr[] = {10, 20, 20, 10, 20}
Output : 2

### Naive Approach:

Create a count variable and run two loops, one with counter i from 0 to N-1 to traverse arr[] and second with counter j from 0 to i-1 to check if ith element has appeared before. If yes, increment the count. 

Below is the Implementation of above approach.



In [16]:
# Python3 program to count distinct
# elements in a given array
 
 
def countDistinct(arr, n):
 
    res = 1
 
    # Pick all elements one by one
    for i in range(1, n):
        j = 0
        for j in range(i):
            if (arr[i] == arr[j]):
                break
 
        # If not printed earlier, then print it
        if (i == j + 1):
            res += 1
 
    return res


In [17]:
def cDistinct(l):
    res = 1
    
    for i in range(1,len(l)):
        if l[i] not in l[0:i]:
            res = res+1
    
    return res
    
l = [10,20,10,30,30,20]

print(cDistinct(l))


def cDistinct2(l):
    return len(set(l))

print(cDistinct2(l))



3
3
