# URL Search for Production Server

This notebook is designed to fetch and process URLs from a production WordPress server environment. It performs two main tasks:

1. Retrieves multisite or primary site URLs from a WordPress database and saves them to a CSV file using the `fetch_multisite_or_primary_site_urls` function.

2. Uses the generated site URLs to fetch media data for each site using the `fetch_media_data_per_site` function, storing the results in a designated output directory.

The notebook connects to a WordPress database using specified configuration parameters and processes the URLs in a systematic way to gather media-related data from the production server.


This cell initializes and configures the database connection settings for accessing a WordPress database by defining the db_config dictionary with necessary credentials and parameters. The cell then calls the fetch_multisite_or_primary_site_urls() function to retrieve URLs from the WordPress database and save them to a CSV file named 'data/site_urls.csv'.


In [None]:
from data_pipeline.get_prod_site_urls import fetch_multisite_or_primary_site_urls

# Define db_config and output_csv_file
db_config = {
    'user': 'wordpress',
    'password': 'wordpress',
    'host': 'localhost',
    'database': 'wordpress',
    'port': 3306
}

output_csv_file = 'data/site_urls.csv'

# Call the function with required arguments
fetch_multisite_or_primary_site_urls(db_config, output_csv_file)

In [None]:
# In your Jupyter Notebook
from data_pipeline.get_media_urls import fetch_media_data_per_site

# Define the database configuration
db_config = {
    'user': 'wordpress',
    'password': 'wordpress',
    'host': 'localhost',
    'database': 'wordpress',
    'port': 3306,
}

# Path to your site_urls.csv file
site_urls_csv = 'data/site_urls.csv'  # Adjust this path if needed

# Output directory name
output_dir_name = 'prod_site_media_csv_output'

# Now call the function with these variables
fetch_media_data_per_site(db_config, site_urls_csv, output_dir_name)

In [3]:
from data_pipeline.find_media_usage import find_media_usage

db_config = {
    'host': 'localhost',
    'user': 'wordpress',
    'password': 'wordpress',
    'database': 'wordpress',
    'port': 3306
}

find_media_usage(
    db_config=db_config,
    media_csv_path='data/prod_site_media_csv_output/ias_media_20.csv',
    wp_table_name='scroll_20_posts',
    output_csv_path='data/media_usage_output/ias_usage_20.csv'
)

Finished. Found 293 media usage entries written to data/media_usage_output/ias_usage_20.csv.
