<a href="https://colab.research.google.com/github/Diiamon/Election-News-Article-Exploration/blob/main/capstone_meta_table.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Main Table Meta Data

|   Column Name  |  Data Type  |                                  Description                                  |                                                 Notes                                                |
|:--------------:|:-----------:|:-----------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------:|
| Source         | String      | The name of the news source or publication from which the article originated. | This column has been label-encoded to create a unique identifier for each source.                    |
| Title          | String      | The title of the article.                                                     | No specific transformations mentioned.                                                               |
| Headings       | String/List | The main headings or topics covered in the article.                           | Initially processed to remove quotes and then exploded for label encoding.                           |
| author         | String/List | The author(s) of the article.                                                 | Processed to standardize the format, separating multiple authors into a list. Exploded for encoding. |
| Published      | String      | The original published date and time of the article as a string.              | Used to extract Publish_Day, Publish_Month, Publish_Year, and Publish_Time.                          |
| Publish_Day    | Integer     | The day of the month when the article was published.                          | Extracted from the Published column and formatted.                                                   |
| Publish_Month  | String      | The month in which the article was published.                                 | Standardised month names (e.g., "Jun" to "June").                                                    |
| Publish_Year   | Integer     | The year in which the article was published.                                  | Extracted from the Published column.                                                                 |
| Publish_Time   | Time        | The time at which the article was published.                                  | Converted to a 24-hour format using regular expressions.                                             |
| Read_Time      | Integer      | The estimated reading time of the article.                                    | No specific transformations mentioned.                                                               |
| Article        | String      | The full text or content of the article.                                      | No specific transformations mentioned.                                                               |
| Published_date | Date        | The formatted date of publication.                                            | Constructed from Publish_Day, Publish_Month, and Publish_Year and converted to a datetime.           |
| [column_name]_id      | Integer     | Unique identifier for specific columns.                                       | Created through label encoding.                                                                      |

## Relational Tables Meta Data

|   Table Name    |       Columns       | Data Type |                                  Description                                  | Notes |
|:---------------:|:-------------------:|:---------:|:-----------------------------------------------------------------------------:|:-----:|
| Source          | Source_id           | Integer   | Unique identifier for each news source.                                       | Primary key. |
|                 | Source              | String    | The name of the news source or publication.                                   | A descriptive name of the source. |
||||||
| Author          | author_id           | Integer   | Unique identifier for each author.                                            | Primary key. |
|                 | author              | String/List | The author(s) of the article.                                                | Can include multiple authors, separated into a list. |
||||||
| Published       | Published_date_id   | Integer   | Unique identifier for each publication date.                                  | Primary key. |
|                 | Publish_Day         | Integer   | The day of the month when the article was published.                          | Derived from the Published column. |
|                 | Publish_Month       | String    | The month in which the article was published.                                 | Standardized month names. |
|                 | Publish_Year        | Integer   | The year in which the article was published.                                  | Extracted from the Published column. |
|                 | Publish_Time        | Time      | The time at which the article was published.                                  | Converted to a 24-hour format. |
|                 | Read_Time           | Time      | The time at which the article was published.                                  | No specific transformations mentioned. |
|                 | Publish_Date        | Time      | The formatted date of publication.                                            | Constructed from Publish_Day, Publish_Month, and Publish_Year and converted to a datetime. |
||||||
| Articles        | Article             | String    | The full text or content of the article.                                      | The primary text content of the article. |
|                 | Article_id          | Integer   | Unique identifier for each article.                                           | Primary Key |
|                 | Title               | String    | The title of the article.                                                     | Each title should be unique for easy referencing. |
|                 | Published_date_id   | Integer   | Foreign key referencing the unique identifier for each publication date.      | Links to the Published_Dates table for detailed date information. |
|                 | Source_id           | Integer   | Foreign key referencing the unique identifier for each news source.           | Links to the Sources table. |
|                 | author_id           | Integer   | Foreign key referencing the unique identifier for each author.                | Links to the Authors table. |
||||||
| Article_author  | Article_id          | Integer   | Foreign key referencing the unique identifier for each article.               | Links to the Authors and Articles table. |
|                 | Author_id           | Integer   | Foreign key referencing the unique identifier for each author.                | Links to the Authors and Articles table. |
||||||
| Article_word            | word_id             | Integer   | Foreign key referencing the unique identifier for each word. | Links to the Word and Articles table. |
|                 | Article_id        | Integer      | Foreign key referencing the unique identifier for each article.  | Links to the Word and Articles table. |
||||||
| article_table_s | Article_id          | Integer   | Unique identifier for each article.                                           | Primary Key |
|                 | [Article, Title] Sentiment Label           | String    | The sentiment label assigned to the article based on analysis.                | Positive, Negative, or Neutral (Bert, Roberta) |
|                 | [Article, Title] Sentiment_Score     | Float     | The sentiment score calculated for the article.                               | Range from -1 (negative) to 1 (positive) (Bert, Roberta)|
||||||
| word_table      | word_id             | Integer   | Unique identifier for each word.                                              | Primary Key |
|                 | Word                | String    | The word itself.                                                              | |
|                 | Article_Frequency   | Integer   | The frequency of the word in articles.                                        | |
|                 | Title_Frequency     | Integer   | The frequency of the word in titles.                                          | |
|                 | Article_IDs         | List      | List of article IDs where the word appears.                                   | |
||||||
| topic_table     | Topic_id            | Integer   | Unique identifier for each topic.                                             | Primary Key |
|                 | Topic               | String    | The topic or theme identified in the articles.                                | |
|                 | Topic Frequency           | Integer      | The frequency of the word in this topic                                   | |

