YouTube Data Scraper

This repository contains a Python-based application designed to scrape data from YouTube videos, including captions. The script dynamically fetches the top 500 videos for any specified genre using the YouTube API, extracting metadata and saving it into a CSV file.

Features

Dynamic Genre Input: Accepts any genre or search term dynamically at runtime, ensuring flexibility during evaluation.
Video Metadata Extraction: Gathers the following details for each video:
- Video URL
- Title
- Description
- Channel Title
- Keyword Tags
- YouTube Video Category
- Topic Details
- Video Published At
- Video Duration
- View Count
- Comment Count
- Captions Available (true/false)
- Caption Text
- Location of Recording
Captions Handling: If captions are available, they are fetched and included in the CSV file.

Assignment Requirements Addressed

Dynamic Genre Support:
- The script prompts the user for a genre or search term at runtime, ensuring no manual intervention is needed to change the input.
Top 500 Videos:
- Fetches up to 500 videos for the specified genre or search term.
- Uses pagination to ensure comprehensive data collection.
CSV Export:
- All collected data points are saved into a CSV file for easy review and submission.

Dependencies

The script uses the following libraries:

googleapiclient: To interact with the YouTube Data API.
youtube_transcript_api: To fetch captions for videos.
csv: To save the extracted data in CSV format.
os: For file handling.
requests: For making API requests.

Install the dependencies using:

pip install -r requirements.txt

Usage

Clone the repository:

git clone https://github.com/yourusername/youtube-data-scraper.git
cd youtube-data-scraper

Replace the placeholder in the script with your YouTube Data API Key:
```
API_KEY = "YOUR_API_KEY"
```
Run the script:
```
python main.py
```
Enter the genre or search term when prompted:
```
Enter a genre or search term: music
```
The script will fetch the data and save it in a CSV file named youtube_videos.csv.

Output

The script generates a youtube_videos.csv file with the following columns:

Video URL
Title
Description
Channel Title
Keyword Tags
YouTube Video Category
Topic Details
Video Published At
Video Duration
View Count
Comment Count
Captions Available
Caption Text
Location of Recording

Limitations

YouTube API Quotas: The script depends on the YouTube API and is subject to daily quota limits.
Runtime: Fetching 500 videos can take several minutes depending on API response times and network connectivity.
Captions Availability: Not all videos have captions, and in some cases, captions might not be available in English.

Submission Guidelines

To submit the assignment:

Run the script for the specified genre.
Submit the generated youtube_videos.csv file with all the required data points.

License

This project is open source and available under the MIT License.

Acknowledgments

Thanks to YouTube for providing API access.
Inspired by the requirements of the Data Scraping Internship assignment.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
youtube_videos.csv		youtube_videos.csv
youtube_videos_Assignment.zip		youtube_videos_Assignment.zip
youtube_videos_v1.csv		youtube_videos_v1.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YouTube Data Scraper

Features

Assignment Requirements Addressed

Dependencies

Usage

Output

Limitations

Submission Guidelines

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

deepakthecoder1982/YoutubeDataScraping

Folders and files

Latest commit

History

Repository files navigation

YouTube Data Scraper

Features

Assignment Requirements Addressed

Dependencies

Usage

Output

Limitations

Submission Guidelines

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages