- This is the repo for the Machine Learning for TensorFlow, TensorFlowJS experiments in Computer Vision. This class features experiments with browser-based machine learning.
![](https://private-user-images.githubusercontent.com/142470034/328259892-bcc99473-9c82-42c4-9361-49b62668165e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3NTc0MzQsIm5iZiI6MTcxOTc1NzEzNCwicGF0aCI6Ii8xNDI0NzAwMzQvMzI4MjU5ODkyLWJjYzk5NDczLTljODItNDJjNC05MzYxLTQ5YjYyNjY4MTY1ZS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYzMFQxNDE4NTRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zNWNkOTA0YjAzMWIyOGM0NzZjNDJjZTEyMTVjOTQ1ZDhiNzBhNDc0NzMyOGYyNDM1ODFmYzhlN2IzZWQyODBhJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.JvSrBN8XMCjJ1nf4h-lyJFNVdeYqmDdt7ZByfa5nHb0)
![](https://private-user-images.githubusercontent.com/142470034/328260015-d6e65d93-d495-4b84-a860-ce1972063f38.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3NTc0MzQsIm5iZiI6MTcxOTc1NzEzNCwicGF0aCI6Ii8xNDI0NzAwMzQvMzI4MjYwMDE1LWQ2ZTY1ZDkzLWQ0OTUtNGI4NC1hODYwLWNlMTk3MjA2M2YzOC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYzMFQxNDE4NTRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03ZWIzYmYxOTNiYjcxMTljOGEyYjM1MjU0YjczZTdlNzFmZWYwZjA5NDk5YjQ2OWIxZmJkYjEwZTZiM2JiYjI1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.BdWpbrOOJR4fKhfiEQ6m0f4871TWuMszIwiGuSesMF4)
![](https://private-user-images.githubusercontent.com/142470034/328260074-b9fc581d-72ef-4297-b607-76cf77dcef4c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3NTc0MzQsIm5iZiI6MTcxOTc1NzEzNCwicGF0aCI6Ii8xNDI0NzAwMzQvMzI4MjYwMDc0LWI5ZmM1ODFkLTcyZWYtNDI5Ny1iNjA3LTc2Y2Y3N2RjZWY0Yy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYzMFQxNDE4NTRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xMjUyYWM2MWI0NWU5ODNiMWQ5MzJkOTFiZjI3ZTM1ODVjYmViNjUyN2Y0N2NkY2UxNzM4ZTJiN2IzNjAwM2I0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.NQ1eCaWICOOoq1rVtbh3qM_qVJA-A527RabaBg9J8T0)
![](https://private-user-images.githubusercontent.com/142470034/328260365-8222b9be-dd75-41dc-8157-1beedfaf1832.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3NTc0MzQsIm5iZiI6MTcxOTc1NzEzNCwicGF0aCI6Ii8xNDI0NzAwMzQvMzI4MjYwMzY1LTgyMjJiOWJlLWRkNzUtNDFkYy04MTU3LTFiZWVkZmFmMTgzMi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYzMFQxNDE4NTRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xZmJlNjViM2RkOGYxZGRmNmQzMGEwZDA0NGY3MGNkYjk2MDYyOTdkY2I5NDUyYWM1YmM4MmI1NzJjNmRhMmFkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.Le2LnAOgqDyhCxU9UC4eKB-kClxFOMPYlvg-GlQEQrw)
User testings with two classes of 20+ students.
- Spearheading real-time HTTPS communication for web and mobile, enabling live video transmission and AI prediction.
- Implemented neural networks leveraging TensorFlow hand pose recognition, achieving 80%+ accuracy in classification.
- Integrated automated data collection system, enhancing user experience, and boosting operational efficiency by 50%.
- What problem am I trying to address❓ I noticed the difficulty to quickly interact with peers during multi-user livestream videos (e.g. Zoom, Google meet). For example, in a online class scenario, if a user want to raise hand to ask a question, the user has to click the emoji button -> select emoji -> deselect emoji (three steps) to complete the user flow of interaction with the professor.
- How can AI help to solve this problem ❓ An AI algorithm, potentially computer vision to classify users’ hand postures, and to directly emit signals to the peers.
- What data is needed to create an AI to help address the issue ❓ A series of input data that is able to precisely conclude humans’ hand postures.
This prototype is based on Daniel Shiffman's The Coding Train. I reduced data collection wait time, and extended data collection time, so that the data collection system can automatically input more data samples at a time. This design upgraded the user experience of data collection.
![](https://private-user-images.githubusercontent.com/142470034/309437796-5381bde6-7286-4604-a902-7aa780815508.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3NTc0MzQsIm5iZiI6MTcxOTc1NzEzNCwicGF0aCI6Ii8xNDI0NzAwMzQvMzA5NDM3Nzk2LTUzODFiZGU2LTcyODYtNDYwNC1hOTAyLTdhYTc4MDgxNTUwOC5naWY_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYzMFQxNDE4NTRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iNGEwYzk3MmE3OTA0ODRjYzA0YjJmY2Y4MGZkZmQ2OTRkNzBhZTMzZWMyMzBmYWZhZjk3MTFhZWVlZTNkMDc2JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.hh67VwaCMI13zlxjeI8xG7lpyzDhxRhTPC3OT0oUfaM)
This prototype is based on TensorFlow Handpose and MediaPipe V2. It has higher performance and lower latency than the previous prototype.
![](https://private-user-images.githubusercontent.com/142470034/323051624-8ed19f1a-1c9a-4524-b6b2-fe356d3175f6.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3NTc0MzQsIm5iZiI6MTcxOTc1NzEzNCwicGF0aCI6Ii8xNDI0NzAwMzQvMzIzMDUxNjI0LThlZDE5ZjFhLTFjOWEtNDUyNC1iNmIyLWZlMzU2ZDMxNzVmNi5naWY_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYzMFQxNDE4NTRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02Y2I0ZDRlNWE2N2YzMzQ3Mjg3YTE0ZWIxYmNhMGNlOWQxNWNkZjJmZDUzNGMzZjkwZWUyYjBkNDQ3MTc4NDNkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.q-yKBznwiH6xNsgzwBWdOvKG3-WRJ9u0wYfBwii7y5s)
- Deep Learning model trained with Jupyter Notebook: Link
- util.py: python funtions to load data (load json data into numpy arrays, shuffle data), preprocess data (slice X_train, y_train into train sets and validation sets), build model (establish neural networks), test model.
- main.ipynb: main workflow to train machine learning model step by step.
- Model summary:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 32) 2048
dense_1 (Dense) (None, 4) 132
=================================================================
Total params: 2180 (8.52 KB)
Trainable params: 2180 (8.52 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
- Model accuracy: 0.89552241563797
- Linux commands:
ssh root@qz2432.itp.io
root@qz2432.itp.io's password:
root@ruby-zhang:~# cd ./live-web/week5
root@ruby-zhang:~/live-web/week5# node server.js
qz2432.itp.io:
![](https://private-user-images.githubusercontent.com/142470034/309438393-087eac2e-59b9-41b5-ba6d-401f7ec1e96f.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3NTc0MzQsIm5iZiI6MTcxOTc1NzEzNCwicGF0aCI6Ii8xNDI0NzAwMzQvMzA5NDM4MzkzLTA4N2VhYzJlLTU5YjktNDFiNS1iYTZkLTQwMWY3ZWMxZTk2Zi5naWY_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYzMFQxNDE4NTRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iZWQxYjRkYmZlYmFlNDI3YTI5YTVmYzA1MjA3YzIzNDk2NjEzYzgxNTA4NjhhMTlkZTVkNjZmNGYzOTYyNjBlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.lgqs8JhnJWYsbjLAfNVxHjSf7B3o1bQ-sB0wW7IVPpo)
- Implementing a video chat application: Zoom, Microsoft Teams, Google Meets.
- Technology: WebRTC provides APIs for capturing audio and video streams from the user's camera and microphone. These streams can be transmitted in real-time between peers, enabling video and audio calls directly in the browser without the need for third-party plugins.
- Experience: Participants can join meetings via web browsers or dedicated applications on various devices.
- Live Chatbox created using gsap library and DOM
![](https://private-user-images.githubusercontent.com/142470034/309440925-4d47690f-2ffb-412c-b572-b7f2fa7b1608.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3NTc0MzQsIm5iZiI6MTcxOTc1NzEzNCwicGF0aCI6Ii8xNDI0NzAwMzQvMzA5NDQwOTI1LTRkNDc2OTBmLTJmZmItNDEyYy1iNTcyLWI3ZjJmYTdiMTYwOC5naWY_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYzMFQxNDE4NTRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04NjZiNWQwY2Q1NDc1N2E2NmE1OWFiODcyYmEyOTQyZGM4NGJjOWI0NGE2Y2NkZjJjOWUxOTUyMmU0NzBhYTUxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.P-yD0pcoQGQxC1gnHbKGumkiG1ednm_0d4GA6QrXB6g)
- Live video prototype using WebSocket
- I tested the web application on webcams of my two laptops. This live video prototype is basing on HTML and
. The web sockets receives canvas data and emit this data to all other clients. All clients update their src within
.