In [1]:
import numpy as np

We start with the probality in the case of coins in the buckets. 
Let:
- N - number of buckets
- k - number of coins we throw

For a specific bucket (let's say number 7) the probability of the first coin going into that bucket is 1/N, so the probability of bucket 7 being empty after the first throw is 1-1/N. 

For the second throw, it will be (1-1/N)^2 because now we need to miss both times 

for the k-th throw the probability of it being empty is (1-1/N)^k, again we would have to miss the bucket 7 all k times.

Now that we have the probability of a specific bucket being empty after k throws, we can calculate to probability of it not being empty (In other words at least one coin is in the bucket) after k throws which will be 1-(1-1/N)^k

If we want to get the expected number of buckets not being empty, then we just have to sum the probabilities of every bucket: 

$\sum_{i=1}^{N} 1-(1-1/N)^k$

Here is the code that computes the value of the sum written above for 6000 buckets, and 7000 coins (or, 7000 integers that are lower than 6000)

In [307]:
N = 6000
k = 7000

In [308]:
expected = np.sum(np.ones([6000])*(1-np.power(1-1/6000,7000)))
expected

4131.76231974688

To verify the formula, we have tried different combinations of the numbers of buckets and throws (in other words the range of the integers and the size of the array) 

The code here presents the results

In [339]:
def expected(N,k):
    return np.sum(np.ones([N])*(1-np.power(1-1/N,k)))

In [340]:
N = 10
k = 10

tmp = []

for i in range(100):
    random_array = np.random.randint(N,size = k )
    tmp.append(np.unique(random_array).shape[0] )
# print(tmp)
tmp = np.array(tmp)
print("unique values in the array: "f'{np.mean(tmp)}')
print("Expected unique values from the formula: "f'{expected(N,k)}')
# expected(N,k)


unique values in the array: 6.46
Expected unique values from the formula: 6.513215598999999


In [349]:
N = 6000
k = 7000

tmp = []

for i in range(100):
    random_array = np.random.randint(N,size = k )
    tmp.append(np.unique(random_array).shape[0] )
    
tmp = np.array(tmp)
print("unique values in the array: "f'{np.mean(tmp)}')
print("Expected unique values from the formula: "f'{expected(N,k)}')


unique values in the array: 4133.12
Expected unique values from the formula: 4131.76231974688


In [350]:
N = 10000
k = 7000

tmp = []

for i in range(100):
    random_array = np.random.randint(N,size = k )
    tmp.append(np.unique(random_array).shape[0] )

tmp = np.array(tmp)
print("unique values in the array: "f'{np.mean(tmp)}')
print("Expected unique values from the formula: "f'{expected(N,k)}')


unique values in the array: 5032.42
Expected unique values from the formula: 5034.320775487754


In [351]:
N = 10000
k = 12000

tmp = []

for i in range(100):
    random_array = np.random.randint(N,size = k )
    tmp.append(np.unique(random_array).shape[0] )

tmp = np.array(tmp)
print("unique values in the array: "f'{np.mean(tmp)}')
print("Expected unique values from the formula: "f'{expected(N,k)}')


unique values in the array: 6993.77
Expected unique values from the formula: 6988.23860403129


In [352]:
N = 100000
k = 100000

tmp = []

for i in range(100):
    random_array = np.random.randint(N,size = k )
    tmp.append(np.unique(random_array).shape[0] )

tmp = np.array(tmp)
print("unique values in the array: "f'{np.mean(tmp)}')
print("Expected unique values from the formula: "f'{expected(N,k)}')


unique values in the array: 63202.56
Expected unique values from the formula: 63212.239823175325


As we can see, the formula and the experiment are very close in every example.