# Analyze Camera Specs

I scraped the Macaulay library for the camera models and lenses (if available) for the top photos of ~3,000 individual photographers. Now it's time to see what gear they're using!

In [15]:
import pandas as pd
import numpy as np

In [16]:
camera_specs_df = pd.read_csv('../results/camera_metadata.csv')
camera_models_df = pd.read_csv('../data/camera_models.csv')

In [17]:
print(f"There are exactly {len(camera_specs_df.model.unique())} camera models in the dataset.")

There are exactly 210 camera models in the dataset.


There are ~200 camera models represented in the dataset, although some of these might be the same camera with different names. I'll probably need to do some manual data processing.

In [18]:
(camera_specs_df['model']
    .value_counts()
    .reset_index()
).head(10)

Unnamed: 0,model,count
0,Canon EOS 7D Mark II,436
1,NIKON D500,320
2,Canon EOS R5,110
3,NIKON D850,92
4,Canon EOS 7D,90
5,NIKON D7200,85
6,NIKON D7500,71
7,Canon EOS 80D,63
8,Canon EOS R7,62
9,Canon EOS 5D Mark IV,61


The **Canon EOS 7D Mark II** and **NIKON D500** are the most common cameras in this dataset by far. I noticed that both cameras were responsible for a large percentage of nice photos from my local area prior to doing this analysis. It's interesting to see "global" trends reflecting the preferences of photographers in my local park.

## What can we learn here?

This isn't a particularly *scientific* analysis, and several confounders likely influence the top cameras. Ultimately, camera choice is a product of industry preferences, blogs, vlogs, and all the subjective stuff I hoped to avoid by doing this analysis.

So what can we learn from this dataset? These cameras are responsible for the most highly rated image of ~3,000 individual photographers from around the world. Although the cameras at top of this dataset represent *preference*, they also represent *proven tools* of wildlife photography. This isn't to say that cameras absent from this dataset aren't capable. Only that cameras in this dataset are *safe* choices for those looking to get into wildlife photography.

There are other relevant bit of information here that I'll explore below:

1. Which camera model provides the best bang-for-your-buck?

2. How have preferences/popularity changed over time? Are some cameras on the up-and-up while others are falling out of style?

3. What is the distribution of focal lengths and f-stops used to take successful images? Both are key factors that determine lens choice (and price).

I need to intersect this dataset of ~200 cameras with current price information and clean up the names. I did this manually. It was a huge pain in the ass, but should still be useful if I expand this dataset in the future.

In [21]:
# Which camera models are in camera_specs_df and not in camera_models_df?
missing_models=camera_specs_df['model'].unique()[~np.isin(camera_specs_df['model'].unique(), camera_models_df['model'].unique())]
if len(missing_models) > 0:
    print("Warning: The following camera models are in camera_specs_df but not in camera_models_df:")
    print(missing_models, "\n")

In [22]:
# Join the camera specs with the camera models
camera_models_df["id"] = camera_models_df['brand'] + ' ' + camera_models_df['name']
camera_specs_df = camera_specs_df.merge(camera_models_df[['model', 'id']], on='model', how='left')

# Calculate the a weighted (normalized to num_ratings) rating for each photo
camera_specs_df['log_normalized_rating'] = camera_specs_df['rating'] * (np.log1p(camera_specs_df['num_ratings']) / np.log1p(camera_specs_df['num_ratings'].max()))

In [23]:
# Count the number of photos per camera
camera_counts = camera_specs_df['id'].value_counts().reset_index()
camera_counts.head(10)

Unnamed: 0,id,count
0,Canon EOS 7D Mark II,436
1,Nikon D500,320
2,Canon EOS R5,110
3,Nikon D850,93
4,Canon EOS 7D,91
5,Nikon D7200,85
6,Nikon D7500,71
7,Canon EOS 80D,63
8,Canon EOS R7,62
9,Canon EOS 5D Mark IV,61


In [24]:
# Calculate the average rating and weighted rating for each camera
camera_ratings = (
    camera_specs_df
        .groupby('id')
        .agg({
            'rating': 'mean',
            'log_normalized_rating': 'mean'
        })
        .round(2)
        .rename(columns={
            'rating': 'avg_rating',
            'log_normalized_rating': 'avg_weighted_rating'
        })
        .sort_values('avg_weighted_rating', ascending=False)
        .reset_index()
)
camera_ratings.head(10)

Unnamed: 0,id,avg_rating,avg_weighted_rating
0,Panasonic Lumix DMC-FZ2500,4.91,4.3
1,Nikon D3,4.86,3.75
2,Pentax K-3 II,4.83,3.74
3,Nikon Z7,4.93,3.27
4,Canon EOS M5,4.87,3.25
5,Canon EOS R,4.88,3.14
6,Sony Alpha 9,4.95,3.1
7,Canon EOS R3,4.9,3.09
8,Apple iPhone XR,4.86,3.05
9,Sony Alpha 6300,4.93,3.04


In [25]:
# Calculate the most popular lens for each camera
lens_popularity = (
    camera_specs_df
        .groupby(['id', 'lens'])
        .size()
        .reset_index(name='count')
        .sort_values('count', ascending=False)
        .groupby('id')
        .first()
        .rename(columns={
            'lens': 'popular_lens',
            'count': 'popular_lens_count'
        })
        .sort_values('popular_lens_count', ascending=False)
        .reset_index()
)
lens_popularity.head(10)

Unnamed: 0,id,popular_lens,popular_lens_count
0,Canon EOS 7D Mark II,EF100-400mm f/4.5-5.6L IS II USM,162
1,Nikon D500,200.0-500.0 mm f/5.6,72
2,Canon EOS R5,RF100-500mm F4.5-7.1 L IS USM,53
3,Canon EOS R7,RF100-500mm F4.5-7.1 L IS USM,30
4,Nikon D850,500.0 mm f/5.6,20
5,Canon EOS 90D,150-600mm F5-6.3 DG OS HSM | Contemporary 015,19
6,Canon EOS 5D Mark IV,EF100-400mm f/4.5-5.6L IS II USM,19
7,Canon EOS 7D,EF400mm f/5.6L USM,18
8,Nikon D7200,200.0-500.0 mm f/5.6,17
9,Sony Cyber-shot DSC-RX10 IV,8.8-220mm f/2.4-4.0,17


In [26]:
# Determine the most highly rated photo taken with each camera
keep_columns = ['id', 'rating', 'num_ratings', 'catalog_number', 'common_name', 'photographer', 'year']
rename_columns = {
    'rating': 'best_photo_rating',
    'num_ratings': 'best_photo_num_ratings', 
    'catalog_number': 'best_photo_catalog_number', 
    'common_name': 'best_photo_bird', 
    'photographer': 'best_photo_photographer', 
    'year': 'best_photo_year'
}
best_photos = (
    camera_specs_df
        .sort_values('log_normalized_rating', ascending=False)
        .groupby('id')
        .first()
        .sort_values('log_normalized_rating', ascending=False)
        .reset_index()
        [keep_columns]
        .rename(columns=rename_columns)
)
best_photos['best_photo_link'] = 'https://macaulaylibrary.org/asset/' + best_photos['best_photo_catalog_number'].astype(str)
best_photos.head(10)

Unnamed: 0,id,best_photo_rating,best_photo_num_ratings,best_photo_catalog_number,best_photo_bird,best_photo_photographer,best_photo_year,best_photo_link
0,Canon EOS R3,4.96,1732,413988131,White-faced Storm-Petrel,JJ Harrison,2022,https://macaulaylibrary.org/asset/413988131
1,Canon EOS 7D Mark II,4.95,1535,39404621,Pied Plover,Luke Seitz,2016,https://macaulaylibrary.org/asset/39404621
2,Nikon D7100,4.92,1128,84730861,Coal Tit (Black-crested),Abhishek Das,2015,https://macaulaylibrary.org/asset/84730861
3,Sony Alpha 7 IV,4.96,995,500057771,Wine-throated Hummingbird,Daniel López-Velasco | Ornis Birding Expeditions,2022,https://macaulaylibrary.org/asset/500057771
4,Nikon D800,4.96,931,603046701,Eastern Whip-poor-will,Megan Gray,2023,https://macaulaylibrary.org/asset/603046701
5,Canon EOS 70D,4.95,938,43164261,Eurasian Dotterel,Ian Davies,2015,https://macaulaylibrary.org/asset/43164261
6,Nikon D500,4.94,924,175078971,Red-throated Loon,Bryan Calk,2019,https://macaulaylibrary.org/asset/175078971
7,Olympus OM-1,4.85,938,511620371,Dunlin,Steven Hunter,2022,https://macaulaylibrary.org/asset/511620371
8,Canon EOS-1D X,4.95,816,372577651,Red-legged Cormorant,Jory Teltser,2021,https://macaulaylibrary.org/asset/372577651
9,Sony Cyber-shot DSC-RX10 IV,4.97,729,612812569,Peruvian Racket-tail,Daysy Vera Castro,2023,https://macaulaylibrary.org/asset/612812569


In [27]:
# Join the camera counts, camera ratings, lens popularity, and best photos
camera_summary = (
    camera_counts
        .merge(camera_ratings, on='id', how='left')
        .merge(lens_popularity, on='id', how='left')
        .merge(best_photos, on='id', how='left')
        .merge(camera_models_df.drop(columns=['model']), on='id', how='left')
)
# remove rows with empty string in name
camera_summary = camera_summary[camera_summary['name'] != '']
# drop duplicate rows: cameras have the same id but different model names
camera_summary = camera_summary.drop_duplicates()

In [29]:
camera_summary.to_csv('../results/camera_statistics.csv', index=False)