# Analyze Camera Specs

I scraped the Macaulay library for the camera models and lenses (if available) for the top photos of ~3,000 individual photographers. Now it's time to see what gear they're using!

In [142]:
import pandas as pd
import numpy as np

In [143]:
camera_specs_df = pd.read_csv('../results/camera_specs.csv')
camera_models_df = pd.read_csv('../data/camera_models.csv')

In [144]:
print(f"There are exactly {len(camera_specs_df.model.unique())} camera models in the dataset.")

There are exactly 200 camera models in the dataset.


There are ~200 camera models represented in the dataset, although some of these might be the same camera with different names. I'll probably need to do some manual data processing.

In [145]:
(camera_specs_df['model']
    .value_counts()
    .reset_index()
).head(10)

Unnamed: 0,model,count
0,Canon EOS 7D Mark II,400
1,NIKON D500,319
2,Canon EOS R5,120
3,Canon EOS 7D,82
4,NIKON D850,80
5,Canon EOS R7,73
6,NIKON D7200,73
7,NIKON D7500,71
8,Canon EOS 80D,65
9,Canon EOS 5D Mark IV,59


The **Canon EOS 7D Mark II** and **NIKON D500** are the most common cameras in this dataset by far. I noticed that both cameras were responsible for a large percentage of nice photos from my local area prior to doing this analysis. It's interesting to see "global" trends reflecting the preferences of photographers in my local park.

## What can we learn here?

This isn't a particularly *scientific* analysis, and several confounders likely influence the top cameras. Ultimately, camera choice is a product of industry preferences, blogs, vlogs, and all the subjective stuff I hoped to avoid by doing this analysis.

So what can we learn from this dataset? These cameras are responsible for the most highly rated image of ~3,000 individual photographers from around the world. Although the cameras at top of this dataset represent *preference*, they also represent *proven tools* of wildlife photography. This isn't to say that cameras absent from this dataset aren't capable. Only that cameras in this dataset are *safe* choices for those looking to get into wildlife photography.

There are other relevant bit of information here that I'll explore below:

1. Which camera model provides the best bang-for-your-buck?

2. How have preferences/popularity changed over time? Are some cameras on the up-and-up while others are falling out of style?

3. What is the distribution of focal lengths and f-stops used to take successful images? Both are key factors that determine lens choice (and price).

I need to intersect this dataset of ~200 cameras with current price information and clean up the names. I did this manually. It was a huge pain in the ass, but should still be useful if I expand this dataset in the future.

In [146]:
# Join the camera specs with the camera models
camera_models_df["id"] = camera_models_df['brand'] + ' ' + camera_models_df['name']
camera_specs_df = camera_specs_df.merge(camera_models_df[['model', 'id']], on='model', how='left')

# Calculate the a weighted (normalized to num_ratings) rating for each photo
camera_specs_df['log_normalized_rating'] = camera_specs_df['rating'] * (np.log1p(camera_specs_df['num_ratings']) / np.log1p(camera_specs_df['num_ratings'].max()))

In [147]:
# Count the number of photos per camera
camera_counts = camera_specs_df['id'].value_counts().reset_index()
camera_counts.head(10)

Unnamed: 0,id,count
0,Canon EOS 7D Mark II,400
1,Nikon D500,319
2,Canon EOS R5,120
3,Canon EOS 7D,82
4,Nikon D850,80
5,Canon EOS R7,73
6,Nikon D7200,73
7,Nikon D7500,71
8,Canon EOS 80D,65
9,Canon EOS 5D Mark IV,59


In [148]:
# Calculate the average rating and weighted rating for each camera
camera_ratings = (
    camera_specs_df
        .groupby('id')
        .agg({
            'rating': 'mean',
            'log_normalized_rating': 'mean'
        })
        .round(2)
        .rename(columns={
            'rating': 'avg_rating',
            'log_normalized_rating': 'avg_weighted_rating'
        })
        .sort_values('avg_weighted_rating', ascending=False)
        .reset_index()
)
camera_ratings.head(10)

Unnamed: 0,id,avg_rating,avg_weighted_rating
0,Pentax K-3 II,4.83,4.53
1,Nikon D3,4.86,4.53
2,Sony Cyber-shot DSC-H400,4.32,3.45
3,Apple iPhone 15 Pro,4.86,3.44
4,Canon EOS-1D X Mark III,4.92,3.33
5,Canon EOS Rebel T6s,4.98,3.32
6,Apple iPhone 12 Pro,4.82,3.3
7,Canon EOS-1D X,4.92,3.28
8,Nikon Z6,4.92,3.27
9,Sony Alpha SLT-A57,4.98,3.27


In [149]:
# Calculate the most popular lens for each camera
lens_popularity = (
    camera_specs_df
        .groupby(['id', 'lens'])
        .size()
        .reset_index(name='count')
        .sort_values('count', ascending=False)
        .groupby('id')
        .first()
        .rename(columns={
            'lens': 'popular_lens',
            'count': 'popular_lens_count'
        })
        .sort_values('popular_lens_count', ascending=False)
        .reset_index()
)
lens_popularity.head(10)

Unnamed: 0,id,popular_lens,popular_lens_count
0,Canon EOS 7D Mark II,EF100-400mm f/4.5-5.6L IS II USM,155
1,Nikon D500,200.0-500.0 mm f/5.6,72
2,Canon EOS R5,RF100-500mm F4.5-7.1 L IS USM,54
3,Canon EOS R7,RF100-500mm F4.5-7.1 L IS USM,34
4,Canon EOS 7D,EF400mm f/5.6L USM,20
5,Nikon D850,500.0 mm f/5.6,20
6,Canon EOS 5D Mark IV,EF100-400mm f/4.5-5.6L IS II USM,19
7,Nikon D7200,200.0-500.0 mm f/5.6,18
8,Canon EOS 90D,EF100-400mm f/4.5-5.6L IS II USM,18
9,Nikon D7500,200.0-500.0 mm f/5.6,17


In [150]:
# Determine the most highly rated photo taken with each camera
keep_columns = ['id', 'rating', 'num_ratings', 'catalog_number', 'common_name', 'photographer', 'year']
rename_columns = {
    'rating': 'best_photo_rating',
    'num_ratings': 'best_photo_num_ratings', 
    'catalog_number': 'best_photo_catalog_number', 
    'common_name': 'best_photo_bird', 
    'photographer': 'best_photo_photographer', 
    'year': 'best_photo_year'
}
best_photos = (
    camera_specs_df
        .sort_values('log_normalized_rating', ascending=False)
        .groupby('id')
        .first()
        .sort_values('log_normalized_rating', ascending=False)
        .reset_index()
        [keep_columns]
        .rename(columns=rename_columns)
)
best_photos['best_photo_link'] = 'https://macaulaylibrary.org/asset/' + best_photos['best_photo_catalog_number'].astype(str)
best_photos.head(10)

Unnamed: 0,id,best_photo_rating,best_photo_num_ratings,best_photo_catalog_number,best_photo_bird,best_photo_photographer,best_photo_year,best_photo_link
0,Sony Cyber-shot DSC-RX10 IV,4.92,475,373545991,Turkey Vulture,Kathryn Young,2021,https://macaulaylibrary.org/asset/373545991
1,Sony Alpha 1,4.94,461,502260161,Short-eared Owl,Nathan Kelley,2022,https://macaulaylibrary.org/asset/502260161
2,Canon EOS R5,4.94,452,534030941,Harlequin Duck,Josh Cooper,2023,https://macaulaylibrary.org/asset/534030941
3,Nikon D500,4.94,428,432138011,Cliff Swallow,Anne Spiers,2022,https://macaulaylibrary.org/asset/432138011
4,Canon EOS 7D Mark II,4.93,426,87875731,Aplomado Falcon,Christian Fernandez,2018,https://macaulaylibrary.org/asset/87875731
5,Canon EOS R6,4.97,357,448672911,Cerulean Warbler,Joley Sullivan,2022,https://macaulaylibrary.org/asset/448672911
6,Nikon D850,4.91,339,491555361,Cooper's Hawk,Jen Davis,2022,https://macaulaylibrary.org/asset/491555361
7,Nikon D3,4.86,314,318547341,Great Horned Owl,Nathalie Talbot,2021,https://macaulaylibrary.org/asset/318547341
8,Pentax K-3 II,4.83,323,80930791,Red-tailed Hawk (abieticola),Vincent Fyson,2018,https://macaulaylibrary.org/asset/80930791
9,Nikon Z9,4.93,269,611064226,Evening Grosbeak,FABRICE SIMON,2023,https://macaulaylibrary.org/asset/611064226


In [153]:
# Join the camera counts, camera ratings, lens popularity, and best photos
camera_summary = (
    camera_counts
        .merge(camera_ratings, on='id', how='left')
        .merge(lens_popularity, on='id', how='left')
        .merge(best_photos, on='id', how='left')
        .merge(camera_models_df.drop(columns=['model']), on='id', how='left')
)
# remove rows with empty string in name
camera_summary = camera_summary[camera_summary['name'] != '']
# drop duplicate rows: cameras have the same id but different model names
camera_summary = camera_summary.drop_duplicates()

In [154]:
camera_summary.to_csv('../results/camera_summary.csv', index=False)