Characteristics of FFT

Include this [FFT_ex1.py], all examples do FFT over the entire interval. If you change FFT windows(unit: # of samples or seconds) , a frequency resolution decreases by the proportion Fsamp/FFTwindows. Eventhough, range of frequency analysis is not changed.
Reference
- Lowest Detectable Frequency
  - F0 = 5*(SR/Window Size)
  - ex ) F0 = 5*(44100/1024) ≃ 215 Hz.
- WS = 5*Fsamp/F0
  - F0 = 440 Hz --> WS = 501
  - F0 = 100 Hz --> WS = 2205
- TR = Window Size/Fsamp
  - If window size is big --> time domain is shrinked.
- FR = Fsamp/Window Size
  - The more bins, the more slices of frequency range we get.
Summary (Reference)
- A greater frequency resolution results in a smaller time resolution.

FFt_ex1.py

This example code is a reproduction of the code in the reference next. Reference

Data Shape!

Frequency Analysis of the Noise - by FFT

The second graph in the plot below shows the FFT analysis of the noise signal added to the original signal. The horizontal axis represents the frequency, and the vertical axis represents the magnitude of each frequency.

FFt_ex2.py

def FFT(sample_rate, duration, signal):
    ...
    return xf, yf, phase_ang

😟 Why are the results different with ex1 ?

In [ex1] code, it obtains power spectrum scale value.

fhat = np.fft.fft(f,n)                     #compute the FFT
PSD = fhat * np.conj(fhat)/n               #power spectrum 
freq = (1/(dt*n)) * np.arange(n)           #x-axis of frequencies 
L = np.arange(1,np.floor(n/2),dtype='int') #only plot 1st half

FFt_ex3.py

def FFT(sample_rate, duration, signal):
    ...
    return xf, amplitude_Hz, phase_ang

😟 Is this ex3 result same with matlab positiveFFT ? Matlab codes are represented below.

Ans : not perfectly same with the result of ex3's.

Reason : Below matlab code is doing 'normalize' as 'X=fft(x)/N*2; % normalize the data'.

         But python [ex3] code get the magnitude values after 'abs' function('amplitude_Hz = 2*abs(Y)').

% test.m
clear;clc;
dt = 0.001; 
t= 0:dt:1;
noise = 2.5 .*rand(length(t),1 );
x= sin(2*pi*50*t) + sin(2*pi*120*t);

positiveFFT(x',length(x'));

Magnitude=(abs(ans))';
Order=0:1:length(Magnitude)-1;
bar(Order, Magnitude);

grid on;
xlabel('Harmonic Order','FontSize',15);
ylabel('Mag','FontSize',15);
set(gca,'FontSize',15);
grid on;

% poisitiveFFT.m
function [X,freq]=positiveFFT(x,Fs)
   N=length(x); %get the number of points
   k=0:N-1;     %create a vector from 0 to N-1
   T=N/Fs;      %get the frequency interval
   freq=k/T;    %create the frequency range
   X=fft(x)/N*2; % normalize the data
 if X(1)~=X(N)
     X(1)=X(1)/2;
 end
   %only want the first half of the FFT, since it is redundant
   cutOff = ceil(N/2); 

   %take only the first half of the spectrum
   X = X(1:cutOff);
   freq = freq(1:cutOff);

Sampler-and-Generator.py

You can do [1], [2] using this file
[1] Sampling
Sampling from Github_set.csv
[2] Generator
You can generate extra columns like below on the new generated .csv file.

Columns to be stacked could be set at the part shown below.

StackedNum = 4               # TobeStacked's copy Num
Samp_Num = 12 * StackedNum   # Sampling from [Github_gener_ex.csv] file.

TobeStacked_all = []

# 12
TobeStacked = np.array([[0, 0, 3, 0, 0, 3], [0, 0, 3, 0, 0, -3], [0, 0, -3, 0, 0, 3], [0, 0, -3, 0, 0, -3],
                        [0, 3, 0, 0, 3, 0], [0, 3, 0, 0, -3, 0], [0, -3, 0, 0, 3, 0], [0, -3, 0, 0, -3, 0],
                        [3, 0, 0, 3, 0, 0], [3, 0, 0, -3, 0, 0], [-3, 0, 0, 3, 0, 0], [-3, 0, 0, -3, 0, 0]])
TobeStacked = TobeStacked.reshape(-1, 6)
TobeStacked = pd.DataFrame(TobeStacked)

SVM/SVM_classifier.py

svm_classifier using a scikit learn

If your data group is more than 3,
you should use below code for ensuring visualization.

z = clf.predict(xy)

If your data group is smaller than 3,
you can also use below code for visualization with rpresentating of margins.

z = clf.decision_function(xy)

PCA_Reconstructing > pca_various_plt_test.py

Below description's Reference

How to reverse PCA and reconstruct original variables from several principal components?

PCA computes eigenvectors of the covariance matrix ("principal axes") and sorts them by their eigenvalues (amount of explained variance). The centered data can then be projected onto these principal axes to yield principal components ("scores"). For the purposes of dimensionality reduction, one can keep only a subset of principal components and discard the rest. (See here for a layman's introduction to PCA.)

Let Xraw be the n×p data matrix with n rows (data points) and p columns (variables, or features). After subtracting the mean vector μ from each row, we get the centered data matrix X. Let V be the p×k matrix of some k eigenvectors that we want to use; these would most often be the k eigenvectors with the largest eigenvalues. Then the n×k matrix of PCA projections ("scores") will be simply given by Z=XV.

This is illustrated on the figure below: the first subplot shows some centered data (the same data that I use in my animations in the linked thread) and its projections on the first principal axis. The second subplot shows only the values of this projection; the dimensionality has been reduced from two to one:

In order to be able to reconstruct the original two variables from this one principal component, we can map it back to p dimensions with V⊤. Indeed, the values of each PC should be placed on the same vector as was used for projection; compare subplots 1 and 3. The result is then given by X^=ZV⊤=XVV⊤. I am displaying it on the third subplot above. To get the final reconstruction X^raw, we need to add the mean vector μ to that:

PCA reconstruction=PC scores⋅Eigenvectors⊤+Mean Note that one can go directly from the first subplot to the third one by multiplying X with the VV⊤ matrix; it is called a projection matrix. If all p eigenvectors are used, then VV⊤ is the identity matrix (no dimensionality reduction is performed, hence "reconstruction" is perfect). If only a subset of eigenvectors is used, it is not identity.

This works for an arbitrary point z in the PC space; it can be mapped to the original space via x^=zV⊤.

Discarding (removing) leading PCs

Sometimes one wants to discard (to remove) one or few of the leading PCs and to keep the rest, instead of keeping the leading PCs and discarding the rest (as above). In this case all the formulas stay exactly the same, but V should consist of all principal axes except for the ones one wants to discard. In other words, V should always include all PCs that one wants to keep.

Caveat about PCA on correlation

When PCA is done on correlation matrix (and not on covariance matrix), the raw data Xraw is not only centered by subtracting μ but also scaled by dividing each column by its standard deviation σi. In this case, to reconstruct the original data, one needs to back-scale the columns of X^ with σi and only then to add back the mean vector μ.

Image processing example

This topic often comes up in the context of image processing. Consider Lenna -- one of the standard images in image processing literature (follow the links to find where it comes from). Below on the left, I display the grayscale variant of this 512×512 image.

We can treat this grayscale image as a 512×512 data matrix Xraw. I perform PCA on it and compute X^raw using the first 50 principal components. The result is displayed on the right.

Data save to a text file

It makes text file with target data.

If you set a below code, you can distinguish the data with lines.

    for i in range(len(data1)):
        if i % 10 == 0 and i > 0:
            file_1.write('\n')
            file_1.write(',')
            file_2.write('\n')
            file_2.write(',')
        elif i % 10 != 0 and i > 0:
            file_1.write(',')
            file_2.write(',')

All data are saved in string format before saving them.

CompSens (Compressive Sensing)

real time domain plot

DCT domain plot

Spectrogram_plot

If you want to place results in low frequency on bottom, active a below code.

spec = np.flip(spec, axis=0)

100Hz

200Hz

2000Hz

10000Hz

wavelet_trans_basic_ex( = Continuous Wavelet Transform(CWT) )

Path Example

CWT_v2 - ex

Mexican Hat Wavelet

What to consider

It otuputs constants shape with normalization factor, meaning it has 'energy' unit.

Also, in this reason, if you wanna plot this outputs, you could just plot outputs.

plt.imshow(cwtmatr, cmap='viridis', aspect='auto',
                 vmax=abs(cwtmatr).max(), vmin=-abs(cwtmatr).max())

Form

$$ {\psi(t)} = {2 \over{\sqrt3\sqrt[4]\pi}} e^{-t^2\over2}(1-t^2) $$

Complex Morlet Wavelets

What to consider

This wavelet has imaginary part.

On the cmorB-C, B and C represents :

B-> bandwith,
C -> center frequecny

Also, if you wanna plot this outputs, you should consider imaginary part.

In this code, i manage it like below.

    power = (abs(cwtmatr)) ** 2
    ...
    wt_power_results = np.log2(power)

Form

$$ {\psi(t)} = {1 \over{\sqrt{\pi B}}} e^{-t^2 \over B} e^{j2 \pi Ct} $$

How to set 'B and C' ?

PyWavelet Reference

Below figure is from above reference site ! They say that reasonable results are come from when scales are above '2'. In other words, when 'C=1', and 'Scale = 1', the highest frequency being analyzed is same with 'Sampling Frequency'. It violate the Nyquist theorem. (i.e., it cannot result reasonable value).

How to set 'scales' ?

In this project, 'scales' is set like below way.

scale_min = 2
#scale_max = 300
scale_max = 130
scales = np.arange(scale_min, scale_max)

If scales contain big large value, the wavelet is to be streched, and going to be sensitive to lower frequencies on the signal.

As mentioned above, start value of scale is recommended to being above 2 for overcoming the nyquist limitaion.

Running Results

if u want to see the origin figure, refer to ppt in this project
Analysis
- If Scale includes large scale, it can output results more segmented.
- If band width frequency of cmor is set to high level, it can output more segmented outputs in time domain region. (but gonna be sensitive)
- Caution : frequency aspects are different in each graph. (cause of plot functions are set to different)

CWT_v2 - ex2

You can verify a relation between 2 things.

Relation b/w FREQEUNCies and SCALEs
Relation b/w CENTER FREQUENCY and SCALEs

As mentioned above, if u set 'C=1' , the resonalble results are come from 'above scale 2'.

If you want to get reasonable results from 'scale 1', then you are recommended to set 'C' below 1. (ex. C = 0.5)

DCT

We must normalization of magnitude of DCT for adjusting the real magnitude with the orginal signal. See below.
Scaling factors for End points (0~ N//2) and the others are different.
- The FFTs where some users use a 2/N scaling conform to Parseval's theorem, where the energy is conserved.
- (https://dsp.stackexchange.com/questions/48049/understanding-where-the-constant-2-n-comes-from-in-fourier-transformation)

# normalize the amplitude
X_mag =abs(X[:n_oneside]) * 1/n_oneside
X_mag[0] =X_mag[0]/2

# same with above (general form)
# normalize the amplitude
X_mag =abs(X[:n_oneside]) * 2/N
X_mag[0] =X_mag[0]/2

Results

Orginal signal

# sampling rate
sr = 60
# sampling interval
ts = 1.0/sr
t = np.arange(0,1,ts)

freq = 0.
x = 1
freq = 1.
x += 2*np.sin(2*np.pi*freq*t)
freq = 2
x += 3*np.sin(2*np.pi*freq*t)
freq = 3
x += 4*np.sin(2*np.pi*freq*t)
freq = 29
x += 5*np.sin(2*np.pi*freq*t)

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
CWT_v2		CWT_v2
CompSens		CompSens
DCT		DCT
Data_Save_t_TXT		Data_Save_t_TXT
FFT_ex1		FFT_ex1
FFT_ex2		FFT_ex2
FFT_ex3		FFT_ex3
HookingData		HookingData
NN_Sensitivity_Analysis		NN_Sensitivity_Analysis
PCA_Reconstructing		PCA_Reconstructing
Pandas_Skills		Pandas_Skills
PathExample		PathExample
SVM		SVM
Sampling-and-Generator		Sampling-and-Generator
Spectrogram_plot		Spectrogram_plot
wavelet_trans_basic_ex		wavelet_trans_basic_ex
README.md		README.md

Jaehoon9201/Python-data-preprocessing

Folders and files

Latest commit

History

Repository files navigation

Characteristics of FFT

FFt_ex1.py

Data Shape!

Frequency Analysis of the Noise - by FFT

FFt_ex2.py

FFt_ex3.py

Sampler-and-Generator.py

SVM/SVM_classifier.py

PCA_Reconstructing > pca_various_plt_test.py

How to reverse PCA and reconstruct original variables from several principal components?

Discarding (removing) leading PCs

Caveat about PCA on correlation

Image processing example

Data save to a text file

CompSens (Compressive Sensing)

real time domain plot

DCT domain plot

Spectrogram_plot

100Hz

200Hz

2000Hz

10000Hz

wavelet_trans_basic_ex( = Continuous Wavelet Transform(CWT) )

Path Example

CWT_v2 - ex

Mexican Hat Wavelet

What to consider

Form

Complex Morlet Wavelets

What to consider

Form

How to set 'B and C' ?

How to set 'scales' ?

Running Results

CWT_v2 - ex2

DCT

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages