# Project Selection
## Topic Selection
- **Overal idea**: Texture classification using contact microphone data from robotic hand
- **Analysis approaches**:
  - 1D Convolutional Neural Network (CNN) on time-series data
  - 2D CNN on spectrogram
  - Hidden Markov Model (HMM) on time-series
  - HMM on spectrogram
- **Problem statement**: Can CNNs outperform HMM models for texture classification tasks on acoustic contact-sensing data gathered from a robotic hand
- **Relevant resources**:
  - [AU Dataset for Visuo-Haptic Object Recognition for Robots](/papers/AU%20Dataset%20for%20Visuo-Haptic%20Object%20Recognition%20for%20Robots.pdf)
    - [Link to datatset](https://figshare.com/articles/dataset/AU_Dataset_for_Visuo-Haptic_Object_Recognition_for_Robots/14222486)
  - [A Biomimetic Fingerprint for Robotic Tactile Sensing](/papers/A%20Biomimetic%20Fingerprint%20for%20Robotic%20Tactile%20Sensing.pdf)
    - [Link to dataset](https://figshare.com/articles/dataset/Supplementary_Material_for_A_Biomimetic_Fingerprint_for_Robotic_Tactile_Sensing_/21120982)

  - [Journal: identifying pill type based on acoustic data gathered via shaking motion](https://tams.informatik.uni-hamburg.de/people/jonetzko/publications/jonetzko2020multimodal.pdf)


### Research
- Hamming window *before* spectral analysis (from [Isolated-word speech recognition using hidden Markov models](/papers/Isolated-word%20speech%20recognition%20using%20hidden%20Markov%20models.pdf))
    - Also mentioned in paper is the fact that since training examples are concatenated, the probability of transitioning from the last state to the initial state is captured
    - A modified Baum-Welch algorithm has been developed to avoid this potential issue
- Remove ambient spectrum (from [A soft, amorphous skin that can sense and localize textures](/papers/A%20soft,%20amorphous%20skin%20that%20can%20sense%20and%20localize%20textures.pdf))
    - Details in [An efficient algorithm to estimate the instantaneous SNR of speech signals](/papers/An%20Efficient%20Algorithm%20to%20Estimate%20the%20Instantaneous%20SNR%20of%20Speech%20Signals.pdf) (estimates noise during periods of **speech** activity)
    - More details in [Spectral Subtraction Based on Minimum Statistics](/papers/Spectral%20Subtraction%20Based%20on%20Minimum%20Statistics%20THESIS.pdf)
- Scale data by mean and standard deviation and re-sample to account for different time-series lengths (from [Fabric Classification Using a Finger-Shaped Tactile Sensor via Robotic Sliding](/papers/Fabric%20Classification%20Using%20a%20Finger-Shaped%20Tactile%20Sensor%20via%20Robotic%20Sliding.pdf))
- Extracting features from the time-series beyond frequency bins (from [Design of a Biomimetic Tactile Sensor for Material Classification](/papers/Design%20of%20a%20Biomimetic%20Tactile%20Sensor%20for%20Material%20Classification.pdf))
- Spectral subtraction (from [Evaluating Integration Strategies for Visuo-Haptic Object Recognition](/papers/Evaluating%20Integration%20Strategies%20for%20Visuo-Haptic%20Object%20Recognition.pdf))
    - Cited paper: [Suppression of Acoustic Noise in Speech Using Spectral Subtraction](/papers/Suppression%20of%20acoustic%20noise%20in%20speech%20using%20spectral%20subtraction.pdf) (estimate noise spectrum during periods of **non-speech** activity)
- Re-binning frequency spectrum (from [Stane: Synthesized surfaces for tactile input](/papers/Stane%20Synthesized%20surfaces%20for%20tactile%20input.pdf))
- Spectral noise subtraction ([explanations and examples](https://abhipray.com/posts/sigproc/classic_speech_enhancement/spectral_subtraction/))
- Classification approach: CNN + transformer (from [An Investigation of Multi-feature Extraction and Super-resolution with Fast Microphone Arrays](/papers/An%20Investigation%20of%20Multi-feature%20Extraction%20and%20Super-resolution%20with%20Fast%20Microphone%20Arrays.pdf))
- [Multimodal Object Analysis with Auditory and Tactile Sensing using Recurrent Neural Networks](/papers/Multimodal%20Object%20Analysis%20with%20Auditory%20and%20Tactile%20Sensing%20using%20Recurrent%20Neural%20Networks.pdf)
    - Mel Frequency Cepstral Coefficents (MFCC) used for feature extraction (MFCC parameters found using Tree-Parzen-based hyperparameter optimization)
    - LSTM-based architecture
    
### Resources
- [HMMlearn speech example (1)](https://blog.goodaudience.com/music-genre-classification-using-hidden-markov-models-4a7f14eb0fd4)
- [HMMlearn speech example (2)](https://maharshi-yeluri.medium.com/understanding-and-implementing-speech-recognition-using-hmm-6a4e7666de1)
- [Multi-variate 1D CNN example](https://github.com/harryjdavies/Python1D_CNNs/blob/master/CCN1D_pytorch_activity.py)
- [CNN dimensions example](https://jinglescode.github.io/2020/11/01/how-convolutional-layers-work-deep-learning-neural-networks/)
- [StackExchange thread about dimensions](https://stackoverflow.com/questions/67842116/how-do-i-properly-package-multi-channel-time-series-data-and-build-the-network-f)
- [LSTM explaination](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)


In [21]:
#importing PyTorch test
import numpy as np
import torch
import torch.nn as nn
x = torch.rand(5, 3)
print(x)

#experimenting with dimensions
input = np.array([[[1, 2, 3, 4],[1, 2, 3, 4]],[[1, 2, 3, 4],[1, 2, 3, 4]],[[1, 2, 3, 4],[1, 2, 3, 4]]])
input = torch.from_numpy(input).float()
input = torch.tensor(input)
m_1 = nn.Conv1d(2, 64, kernel_size=1)
m_2 = nn.MaxPool1d(2)
m_3 = nn.Flatten()
m_4 = nn.Linear(128,100)
m_5 =  nn.Linear(100,6)
m_6 =  nn.Softmax()

print(input.size())
output = m_1(input)
print(output.size())
output = m_2(output)
print(output.size())
output = m_3(output)
print(output.size())
output = m_4(output)
print(output.size())
output = m_5(output)
print(output.size())
output = m_6(output)
print(output.size())



tensor([[0.0844, 0.5088, 0.1909],
        [0.0827, 0.7211, 0.7862],
        [0.4412, 0.8704, 0.2868],
        [0.2071, 0.2094, 0.7199],
        [0.9722, 0.9868, 0.3530]])
torch.Size([3, 64, 4])
torch.Size([3, 64, 2])
torch.Size([3, 128])
torch.Size([3, 100])
torch.Size([3, 6])
torch.Size([3, 6])


  input = torch.tensor(input)
  return self._call_impl(*args, **kwargs)
