Once I had to analyse a weird set of data: I was collecting data while pumping my vacuum chamber down, from an atmosphere of pressure to almost none. The pump down was extremely slow so I could collect GB of information, but the pumping down is also a very inhomogeneous process. 
This means that I had a very different amount of data for different pressure ranges: for example, I had over a thousand data points between 0 and 100atm and just 30 between 600 and 700atm. 
This makes the analysis quite tricky, so how would you handle this?
My supervisor suggested I bin the data according to different pressure ranges, then choose for every bin a number of data points, N, that to be statistically consistent has to be the number of data points of the smallest bin. This means that in the example above, every pressure bin should have 30 data points (min_bin), being 30 the least number of data points I obtained in the bins 0-100atm, 100-200atm, 200-300atm and so on.
But now another problem arises. Which 30 data points of the bin do I choose? The first 30, the last 30, the middle 30...?
There is really no consistent way to choose, so the only option is to randomize the choice. 
Pandas allows to bin the data points and randomize them very easily and I will expose the program step by step.
The data file is organized in a table with several columns. The first column contains the pressure values, the other columns contain other information like for example how long the laser light takes to decay in the chamber. 
I would like to obtain a table with binned pressures and averaged values of min_bin elements for the remaining columns.

In [None]:
import pandas as pd
import numpy as np

#I import the file to be analyzed, file.txt, with pandas and read it as a table. 
#The data file columns are tab separated
merged_values=pd.read_table("file.txt",header=None)

#I rename the column I am interested in (the column of pressure data) as "P"   
merged_values.rename(index=str, columns={0: "P"},inplace=True)

#I sort the values of the table according to pressure
merged_values=merged_values.sort_values(by=['P'])

#I bin the values in the table with the Pandas function "cut", then I group the binned results by pressure. 
#Finally, I convert the result in a Pandas dataframe, to be able to use it later.
n_bins=90
merged_values_grouped=merged_values.groupby(pd.cut(merged_values['P'],n_bins)).apply(lambda x: x) 

#I count the number of elements in each bin, convert this into an array 
#and find the minimum number of elements at all in the array
bin_count=merged_values.groupby(pd.cut(merged_values['P'],n_bins)).count()
bin_count_elements=np.array(bin_count['P'])
min_bin=min(bin_count['P'])

merged_values_random=[]
merged_values_random_std=[]
merged_values_randomised=[]


for i in range (0,len(bin_count_elements)):
    count=0
    count=sum(bin_count_elements[0:i])
    
    #I select each element in the binned data frame thanks to the variable "count", which is reset at every cycle.
    #The variable count provides the row of the first element of every bin, "count+bin_count_elements[i]" the last:
    
    bin_elements=merged_values_grouped[count:count+bin_count_elements[i]] #Here I have all the elements of a single bin in
                                                                          #a variable, which I can use to randomize them
    
    #Thanks to the function sample() I can choose how many elements of the bin I want, I chose min_bin of them
    bin_elements=bin_elements.sample(min_bin)
    
    #I create an array in which to append the mean and standard deviation of the elements of the remaining columns of the table
    merged_values_random.append(bin_elements.mean())
    merged_values_random_std.append(bin_elements.std())
    
    #These arrays can be used further or just dumped in a file