I am a fresh graduate with a strong interest in coding looking for opportunities to apply my knowledge and skills in a professional setting! I have worked with companies such as EDP Renewables and Smart Nation Translational Laboratories to aid in audio processing and data cleaning projects which will be described in more detail below.
- 👀 I’m interested in becoming a better programmer!
- 👨💻 I have a brief experience with SQL, Java, and C.
- 🚧 I have a strong foundation in Python for data cleaning and data validation!
- 🦕 I’m looking to collaborate on projects where I can learn and put my skills to use.
- 📫 How to reach me at: ACHIANG004@e.ntu.edu.sg
- Database Management: SQL, MySQL
- Audio Pre-Processing: Python, Jupyter Labs, soundfile, noisereduce, cv2, converter, pydub, youtube_dl, numpy, pandas, shutil, subprocess, openpyxyl, moviepy, tqdm, converter, ffmpeg
- Data Visualization: VS Code, Python, Seaborn, Matplotlib
- Data Cleaning: VS Code, Python, numpy, pandas, json, os, glob, csv, shutil, re, sys, datetime
- Conducted video processing using Python, JupyterLabs & Audacity to automate the conversion of audio file formats, download significant audio and video datasets from an online repository.
- Performed data cleaning by normalizing audio waveforms and reducing background noises on audio datasets for future Machine Learning projects.
- Developed an automated process using ffmpeg to accurately crop over 400 video files given their unique timestamps corresponding from an external Excel file.
- Utilized Python’s json, pandas, numpy, re, csv, shutil, os, datetime and glob libraries to develop a data pre-processing pipeline handling data quality, data transformation, meta data generation and data publication.
- Employed the pipeline to identify missing values & outliers, transform logarithmic & time series data, perform data aggregation, execute time gap analysis and construct a data catalogue.
- Maintained and produced user, design and test procedure documentations to establish and maintain the pipeline development process.
- Applied matplotlib library to conduct Data Validation on historical energy data, successfully identifying data dropouts and permanent step changes in our data collection system.
- Deployed matplotlib library to conduct Correlation Analysis on time series data.
- Designed Entity Relationship (ER) diagrams from data relationship sentences, gained a better understanding of attributes, unique attributes, entity instances and partial keys.
- Designed Relational Schemas from their respective ER diagrams and data relationship sentences as well. Gained a better understanding of tables, primary keys, foreign keys and many-to-many relationships
- Utilized SQL to extract, sort, identify data of interest from a real world dataset (Dognition database). Generate queries and sub queries using clauses such as GROUP BY, FULL JOIN, DISTINCT, HAVING etc.