### <center>Youtube channel video information crawler</center>

Function description: Crawl the information of all videos under a `YouTube` channel, including video ID, title, description, release time, and save it as a `csv` file

Dependency package installation: Please ensure that the `google-api-python-client` library has been installed. If not, install it with the command:
``` bash
pip install google-api-python-client
```
Running the `.ipynb` file requires the `ipykernel` package, run the following command to install `ipykernel` into the `Python` environment:
``` bash
conda install -n reptile ipykernel --update-deps --force-reinstall
```

In [5]:
import os  # Import Python's built-in os module
import csv  # Import Python's built-in csv module
from googleapiclient.discovery import build
# Import the build function from the discovery module in the googleapiclient package to access Google API services

#### Get YouTube API Key

1. Visit [Google Cloud Platform Console](https://console.cloud.google.com/), and log in to your Google account.

![](assets/屏幕截图-2023-05-05-144025.png)

The first login will ask you to choose a place to centrally create and manage your `Google Cloud` instances, disks, network, and other resources.

![](assets/屏幕截图-2023-05-05-144906.png)


2. If you already have a project, select one from the Project drop-down list.

![](assets/屏幕截图-2023-05-05-145149.png)

If you haven't created a project yet, click the "Create Project" button and enter a project name and other relevant information.

![](assets/屏幕截图-2023-05-05-145338.png)


3. In the left navigation bar, click the Navigation Menu icon, then select APIs & Services > Libraries.

![](assets/屏幕截图-2023-05-05-145820.png)

Type `YouTube Data API v3` in the search box and click `YouTube Data API v3` in the search results.

![](assets/屏幕截图-2023-05-05-145926.png)

![](assets/屏幕截图-2023-05-05-150024.png)


4. Click the "Enable" button to enable the `API`.

![](assets/屏幕截图-2023-05-05-150051.png)

5. Once the `API` is enabled, you will be redirected to the `API` overview page. On this page, click the "Create Credentials" button.

![](assets/屏幕截图-2023-05-05-150201.png)


6. On the Create Credential page, select Public Data as the credential type. The system will automatically generate a new API key.

![](assets\屏幕截图-2023-05-05-150757.png)


7. Copy the newly generated `API` key and paste it into the `Python` code, replacing `api_key`.

![](assets/屏幕截图-2023-05-05-150854.png)


8. Now that you have successfully obtained the `YouTube API Key`, you can use it in your code to access the `YouTube Data API`. Please make sure to keep your API key safe and do not share it with others. If desired, you can manage and delete your `API Keys` on the APIs and Services > Credentials page in the `Google Cloud Platform` console.

In [6]:
api_key = "" #YouTube API key
# Use Google API to build a client object to access YouTube video service
youtube = build("youtube", "v3", developerKey=api_key)

In [7]:
"""
Function function: Get all video information of the specified YouTube channel
Incoming parameters: Accept a parameter channel_id as the channel ID to get the video
Return value: returns a list videos containing all video information
"""
def get_channel_videos(channel_id):
    # Define the empty list video_ids to save the video id and the variable next_page_token to store the token of the next page data
    video_ids = []
    next_page_token = None
    # Loop to get the next page of video information
    while True:
        # Use the search().list() method of the googleapiclient.discovery module to send a request to the YouTube API
        request = youtube.search().list(
            # Basic information of the video to be obtained (such as title, description and release time, etc.)
            part="id",
            channelId=channel_id,  # Channel ID to be queried
            maxResults=50,  # The maximum number of videos returned per page
            pageToken=next_page_token,  # The token of the next page of data to be obtained
            # The type of resource to be queried (here it is video, that is, video)
            type="video",
        )
        # Execute the query request and save the result to the response variable
        response = request. execute()
        # Extract the video ID from the YouTube API response and add it to the existing list of video IDs
        video_ids.extend([item["id"]["videoId"] for item in response["items"]])
        # Get the value of the next page token in the YouTube API response
        next_page_token = response. get("nextPageToken")
        # Exit the loop when there is no next page of data
        if next_page_token is None:
            break
    videos = []  # Create an empty list videos for storing video details
    for video_id in video_ids:  # traverse the video id list extracted before
        request = youtube.videos().list(  # get details for a specific video
            part="snippet, contentDetails, player, statistics, status", id=video_id)
        response = request.execute()  # Response contains requested video details
        # Add the video information in the response to the videos list through the extend() method
        videos. extend(response["items"])
        # Determine whether there is data on the next page, if so, update the next_page_token variable, and continue to loop to get the next page of data
        next_page_token = response. get("nextPageToken")
    # Return a list of all video information videos
    return videos

To know the video attribute of Youtube video and its details, please refer to https://developers.google.com/youtube/v3/docs/videos

![](assets/屏幕截图-2023-05-07-004356.png)

In [8]:
"""
Function function: save the specified YouTube video information to a CSV file
Incoming parameters: videos indicates the list of video information to be saved, csv_name indicates the name of the CSV file to be saved
return value: none
"""
def save_to_csv(videos, csv_name):
    # Use Python's built-in open() method to open the specified file
    with open(csv_name, mode="w", newline="", encoding="utf-8") as file:
        # Create a csv.writer object to write CSV format data
        writer = csv. writer(file)
        # Use the writerow() method to write the header row
        writer.writerow(["id", "title", "publishedAt", "duration", "definition", "caption", "licensedContent",
                        "viewCount", "likeCount", "commentCount", "description", "embeddable", "player"])
        # Write the line corresponding to each video
        for video in videos:
            # The details of the video are as follows
            writer.writerow([
                # The ID that YouTube uses to uniquely identify the video
                video["id"],
                video["snippet"]["title"],  # video title
                video["snippet"]["publishedAt"],  # Published date and time
                video["contentDetails"]["duration"],  # video duration
                video["contentDetails"]["definition"],  # Definition
                # Whether there are subtitles
                video["contentDetails"]["caption"],
                # whether it is protected by copyright
                video["contentDetails"]["licensedContent"],
                video["statistics"].get("viewCount", "N/A"),  # number of views
                video["statistics"].get("likeCount", "N/A"),  # Number of likes
                video["statistics"].get(
                    "commentCount", "N/A"),  # Number of comments
                video["snippet"]["description"],  # video description
                # Whether the video can be embedded in a web page for playback
                video["status"]["embeddable"],
                # The embed code for the video, which can be inserted into a webpage to play the video
                video["player"],
            ])


#### Get YouTube Channel ID

1. Open the `YouTube` channel page you want to scrape.

![](assets/屏幕截图-2023-05-05-164509.png)

2. Use `F12` or right mouse button "Inspect (`Inspect Element`)" to open the browser's debugging tool

![](assets/屏幕截图-2023-05-05-152759.png)

3. Find the search bar or use the shortcut key `Ctrl+F` to search for "`/channel/`". A channel ID is a string of characters in the URL, usually beginning with "`/channel/`", for example:

    - /channel/UCoC47do520os_4DBMEFGg4A

    In these examples, the channel ID is "`UCoC47do520os_4DBMEFGg4A`"

![](assets/屏幕截图-2023-05-05-164639.png)


4. Copy the channel `ID` to the `Python` code and replace `channel_id`.

In [9]:
def run():
    # YouTube channel name
    channel_name = "李子柒 Liziqi"
    # YouTube channel ID
    channel_id = "UCoC47do520os_4DBMEFGg4A"
    # Get all video information of the specified YouTube channel
    videos = get_channel_videos(channel_id)
    # Save the specified YouTube video information to a csv file
    save_to_csv(videos, channel_name+".csv")
    # Output the result of saving file information
    print(f"Video information has been saved to {channel_name}.csv")


In [10]:
# run the program
if __name__ == "__main__":
    run()

Video information has been saved to 李子柒 Liziqi.csv


#### Solve the problem of garbled characters when opening csv files with Excel

1. Create a new Excel file

![](assets/屏幕截图-2023-05-07-002333.png)


2. Switch to the "Data" menu, select the data source as "From Text", select the CSV file, and select the exported CSV file

![](assets/屏幕截图-2023-05-07-002433.png)


3. The text import wizard appears, select "65001: Unicode (UTF-8)" for the original format of the file, select "comma" for the separator, and finally click Load

![](assets/屏幕截图-2023-05-07-002517.png)


4. Finally get the normal decoded xlsx file

![](assets/屏幕截图-2023-05-07-002608.png)