Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small images #1

Open
OrsonDeWitt opened this issue Sep 17, 2023 · 10 comments
Open

Small images #1

OrsonDeWitt opened this issue Sep 17, 2023 · 10 comments

Comments

@OrsonDeWitt
Copy link

Hey, I just checked the archive and the bird images are so small it is oftentimes hard to tell what I'm looking at :) is this intended?
Also, I was wondering about the method you use to determine which photo is good and should be added to the archive? In my own project I had to manually sort them out because I often got pictures of eggs/grass/feathers/etc.

@HansSchouten
Copy link
Owner

HansSchouten commented Sep 17, 2023

Hi Orson, I came acros your project via Reddit. It looks great! I will soon install it and have a look in more detail.
I agree the images in this repo are quite low resolution. In my app this is important since they need to be visualized in large numbers so file sizes need to be kept low. For example a view of personal life list at a custom location with species sorted by rarity and unseen species left greyed out:
afbeelding
I might need to download them all again from the Flickr API and store both a thumb and original resolution.
These images are hand-picked based on Flickr API searches using Scientific name, English name and Dutch name using the Public Domain Mark and Public Domain Dedication (CC0) licenses.
I select them based on: size in the image frame, sharpness, the right wild habitat, no background distractions, lighting conditions and with preference for male adult birds in breeding plumage.
In the FaunaMap app I have made a wizard for admins to quickly go through Flickr suggested images. If you register I could make you admin if you want to have a look at it. In the near future I plan to add a voting mechanism allowing users to select specie images, including user uploaded images. This repo is currently updated automatically once a day if a specie image gets replaced.
What source of images are you using?

@OrsonDeWitt
Copy link
Author

OrsonDeWitt commented Sep 17, 2023

FaunaMap? Sounds like something that I had in mind when I started 😄 is this going to be about more than birds? And you seem to have a lot more features, is this going to be a website?
I understand, but this is only for the list, right? I assume you can still click on the species and then a user would want to see a full picture. Some rarer species don't have such perfectly framed pictures, and even on a full picture it is very hard to figure out what you're looking at. So I think having a full sized picture is very important here.
I wonder, how did you calculate rarity?
I was using GBIF to get links for bird images, so most of them are links to inaturalist. Doing it manually sounds like a pain, considering there are more than 10000 bird species. Are you planning to do them all manually?

Edit: I just read the description of the repo :D I'll try out the website now.

@HansSchouten
Copy link
Owner

HansSchouten commented Sep 17, 2023

Yes, FaunaMap is both a website (app.faunamap.nl) and an Android app with a single codebase based on this boilerplate. It is mainly birds and some mammals, of which I should still add a proper taxonomy. I hope to include all animals eventually. I might opensource the entire project, but still unsure about that (it has some game elements to encourage users to make more observations and maybe spoiling all badges/rewards is less fun).
Haha yes your project is quite similar and your species distribution / rarity is for sure more thought out. I included this smaller GBIF dataset and simply sorted by number of observations per species. It also mainly focusses on the Netherlands, so with a larger GBIF dump and some normalisation it should soon be adjusted to the user location.

True, the rarer species are a bit harder to find a quality image of and larger resolution would be useful there. I currently have 1087 images, of which 447 are CC0 so included in this repo. You are now admin, so via Admin > Specie images the Flickr API tool can be checked out. I agree it is quite a long way to 10000, but I enjoy the process and learn about species along the way. I haven't had image URLs in the dataset yet, but thanks for the suggestion I will check them out in the larger dump.

You can also access a lot more observations now (which can be turned on/off in the profile settings). To have this view worldwide in combination with location-based rarity colors to easily browse through recent and historical observations would be a nice goal to achieve. And all user observations are planned to be accessible for anyone, without any restrictions.

Btw, your project has great user species statistics and handy image batch import!

@OrsonDeWitt
Copy link
Author

Ah, I was planning to do something like that as the next step after finishing BirdMApp and finding a job, but at the lack of response I figured nobody's interested in something like that. Hope it works out for you!
You should check out GBIF for the taxonomy. I struggled with the taxonomy too and opted for using HBW list before I learned that there was a better IOC bird list on GBIF available.

Well, I am definitely intrigued by the way your app shows the amount of species in each location as opposed to countries like mine does. But how do you account for vagrant species that appear once in a lifetime in a location? I think they should be marked very very clearly that they are very rare at that location so as not to confuse users like ebird does.

Wow, that looks amazing. I had to write down each photo link that was bad to exclude it when I was working with images. I also found it very satisfying to look through all the species and learn about them, I can confidently say that I know a lot more about birds than before. But your system is much better. I laughed at the dozens of selfies and vehicle images when I clicked on siberian jay though :D

Maybe you could have big sized images in this repo, but only use the small ones in your app? Still, I think having a bigger resolution on click would also be useful, as there are species that can't discerned unless looked at very closely

Also, for the rarer species that don't have CC0 images, maybe it could be a good incentive for people to upload photos for them and make them available for everyone (CC0?) if they could get some feature/badge/etc unlocked for it?

Yes, I find historical data to be more useful than just overall data. It's why we use GBIF :)

So you've tried it? How do you feel about the app as a whole?

@HansSchouten
Copy link
Owner

HansSchouten commented Sep 19, 2023

Thanks for suggesting the GBIF taxonomy, I will have a look at it. Especially for the mammals it could be useful. For birds I will compare IOC against the eBird taxonomy I am currently using via the eBird API.

I agree on rewarding users for supplying images for rarer species, that is exactly how FaunaMap could contribute to this repo. I would like to apply some Pokemon GO principles, as that app obviously did things right to get so popular and collecting pokémon is not that different from collecting bird species. I would be nice to apply gamification to reach a broader audience for collecting observations. I know quite some birders that don't use eBird or similar tools, which is a missed opportunity for more having scientific data available.

I am thinking a bit about country borders since even inside a country the rarity of birds changes a lot. So that's why for FaunaMap I mapped all species into 5x5km squares and show suggestions based on the selected locations. I might filter out vagrant species by looking at a sliding window of 10 years and check whether a specie is observed each year (or 80% of the years) in a specific 5km square.

Yes I tried out the BirdMApp, great project! I liked the way you list the species per country with bold, less bold and rarity color hue. The vagrant species and ordering per country are handled much better than in FaunaMap right now. And you have indeed larger images which are more clear, so will definitely improve that in this project. I also liked the way distribution is shown. Especially for the different months of the year as this is very important. I like to have an easy way of accessing the data in the field, that is why I made a website & app, maybe that could be done with your project as well. And otherwise apps like eBird can be used and exports loaded into the BirdMApp desktop application for viewing statistics later, which already works perfectly fine.

@OrsonDeWitt
Copy link
Author

Ha-ha, this is crazy. That's exactly what I was thinking. Make it like a pokedex, where you can collect pokemon from all the regions. There's just one thing that I couldn't figure out, and that's whether it should have image recognition or not. I imagined that having it would mean that users can't game it (and it would have to be really good) and it's therefore more likely to be picked up as a game that people share with each other and compete. On the other hand, it would be frustrating if it wasn't perfect, so then it'd be better to let everyone play the way the want to, even if cheating. But then it's not useful for researchers.

Why are you using ebird instead of GBIF? It has a much broader spectrum of observations, whereas ebird is mainly amateur data. I've noticed that some countries have their own datasets that they upload to GBIF but are not present in ebird. Also, I don't think you'll see the variety of birds in remote locations like Antarctica if you only take ebird data.

Thanks! I am more familiar with R than with any other programming language, so it was easier for me to make in R. Unfortunately, R/Shiny isn't very portable, so if I had to make it useful for other platforms I would have to make everything from scratch, and, as I said, I put it on the backburner since I need to find a job before I continue with hobby projects. But now that I see you working on this I know that I don't even have to do that. I'd be happy to help here instead!

P.S. singular for "species" in English is "species" :)

@HansSchouten
Copy link
Owner

HansSchouten commented Sep 19, 2023

Haha, exactly! The "My species list" should be your pokedex:
afbeelding
And the pokedex should be sorted on local rarity (still todo, now it is from The Netherlands). Or a more general view of bird families, similar to the pokémon generations.

I used GBIF data for the local species lists, but am currently processing the 500GB dataset which is needed to make it work and adapt worldwide (completed very soon). The eBird API was just for obtaining an initial taxonomy and I used it to generate some observations to view on the map for the initial users. I know lots of birders and photographers that prefer recent data over GBIF data, since they want to have more certainty to observe a target species (is it actually there at the moment?). So if these people have some quick insights that are useful to "twitch" to a recent animal sighting, they might be drawn by gamification and ease-of-use to actually start contributing (useful scientific) observations via FaunaMap themselves.

Forcing image recognition is tricky, since not all birders are photographers so it would exclude lots of otherwise useful birders from contributing. And suboptimal performing recognition is for example why iNaturalist Seek is having quite a lot of bad reviews. I use ObsIdentify and Merlin image recognition often and they are pretty good, but still not perfect. I have yet to find out if there are APIs or open models for bird image recognition, similar to BirdNET-Analyzer for audio recordings.

Fixing problems like preventing to game the system with fake observations is a good one that is still unsolved, any help is much appreciated. Currently adding an observation is +20XP and adding proof boosts to +250XP, maybe tricks like that will work. Possibly with a requirement or XP boost to be actually there at the moment of adding the observation. Still ruining your own dex to complete badges is possible, but since the app targets birders that want to actually keep track of species at different locations, I guess this would not be a very common issue. Also completing short fun quizes to check user competence (in order to level up or add a very rare sighting) could be a possibility. I am also considering adding "Gyms" at public birding hides and other natural hotspots (OpenStreetMap dumps) for users to show off their nearby sightings (only with proof).

I don't have experience in R, so unfortunately don't know how to help porting it to another language or app format. But since a main goal of FaunaMap is collecting and exporting observations it would of course become possible to combine these apps, exporting from FaunaMap to BirdMApp.

Haha thanks for pointing out, a lot of English should still be double-checked and if I switch my profile to English I have some JS glitches. I wrote it in Dutch first, but soon it will fully function. If you have any time to contribute (ideas or code) any moment later in the future it would be much appreciated (I could join your Discord if you want to have a chat later on). But already thanks now for sharing your insights!

@OrsonDeWitt
Copy link
Author

Awesome, but I was also thinking about separation by regions (Sinnoh, Hoenn, etc being equal to biogeographic regions, some species naturally overlapping) and pokemon-like rarity ("pseudo-legendaries", "legendaries", "mythicals")

For me, the only recent data that makes sense is the immediate past, like a month at most. Otherwise, I prefer to see overall data throughout the recent years (say, 10 years), hence my app. I don't care about birds seen in a hotspot last year because it could have been an outlier and it won't help me at all this year. So I take the average and hope for the best.

500GB seems a lot. Are you cleaning the dataset for duplicates, bad coordinates, old records? With what tool are you processing it?

You're right, so image recognition is not the way, but then how do you ensure that all these observations are actually useful and true? I think the venn diagram of birders and those that need such an app to go out and explore barely overlaps. I think that if you are making such a gamified app (plus, you said it's not going to be only birds), you're already expanding the target audience to more than just birders. And if the app were to become popular among more than just birders, there's a chance there would be people that would fake their records. I don't know if XP boost is going to stop somebody, but it's a great step in the right direction, imo. Short quizzes are also a great idea, but if the app is popular, you can bet there will be posts online outlining how to answer those quizzes. What would count as proof to you, by the way?

Well, it takes the data from GBIF, so if you send data from FaunaMap to GBIF one day, it'll be there :) by the way, if pokedex is the end goal, why did you decide to call it FaunaMap instead of, say, FaunaDex?

You could just add me on discord, it's @ orsondw :) I'm not very good at JS at all, but I am good at working with data and languages. I can proofread your English if you'd want me to, and see if there's anything else I can help you with on the data/website front

@HansSchouten
Copy link
Owner

HansSchouten commented Sep 23, 2023

Good suggestion, similar to the scatterbugs (vivillon), i.e. having different collections at the various continents (with only endemic birds). It would be cool to have these game elements based on real information to make it: fun, educational, to raise awareness and to collect useful observations.

I agree, I prefer recent data with a fallback on long term averages (number of years it has been seen in a specific place) for overall likeliness and for finding hotspots to travel to.

500GB was an unzipped file of >1 billion bird observations, I think you used the same GBIF selection. I batch process it in Python using Pandas DataFrames on some powerful servers (32 cores, 126GB RAM) I already have for other projects. I just released an initial worldwide dataset in the FaunaMap's Discover Locations tab. I will process a few more times coming weeks to extract multiple datasets (per country, per month, etc).

Yes, I agree this is a difficult project to correctly gamify. I don't have the solution yet, but if it eventually works out it could be very powerful, so worth thinking about :) Agreed on the limitations of quizes. At least with a combination of small requirements like quizes (and many more) it becomes less attractive due to increasing time efforts of faking. Another solution could be verifiers. For each observation that I make I only get a very small amount of XP. If a different user adds the same species to roughly the same location within 24h (or a week depending on the remoteness of the site) the first user will get more rewards. Each additional user would yield all prior users a reward. And the reward could be based on rarity, so a verified swan would yield less rewards than a rare eagle. This of course brings the problem of the main user having duplicate accounts, but it at least adds to the effort required to fake. And duplicate accounts could be checked by storing a set of hashed recent IP addressess and computing the similarity between users (not banning users, just automatically halting the verification rewards). Also accounts that don't have more than X% first species observations won't verify and reward other's. If at the end of the day going to nature receives more XP than faking and confirming your own observations it will work.

For me proof is: image, video, audio and other users that have made a similar observation. Users could also gather XP by spending some time in the proof verification area. A random user image/video/audio is provided (only from species the user has already observed in his dex) and the user has 3 simple buttons: correct, unsure, wrong (with species suggestion). The determination is accepted if it reaches a >80% certainty score. Possibly weighing user choices by the level they are in, adding more weight to the choice of more advanced users.

Haha FaunaDex. I made the first version of FaunaMap in 2019 when the idea was a map to see recent rarities from external sources and adding your own observations was not possible.
Great, thanks! I added you on Discord. Any help on checking the app or suggesting gamification and other improvements is more than welcome!

@OrsonDeWitt
Copy link
Author

OrsonDeWitt commented Oct 2, 2023

Yeah. I think it'd be lots of fun not only to play it, but to create this system, too!

Hm, I see. I didn't trouble myself with figuring out AWS, so I did everything in R with my own machine. I think I promptly took some years off the life of my RAM, lol. Are you using AWS?

Also, having to pass a quiz every time could get tiring, like it's tiring for me to enter every single detail into ebird about each mallard duck that I for some reason decided to record. I really like the idea of verifiers, that's definitely a good solution, at least until there's many users with incentives to cheat. This problem might never even materialize, so then it would be a permanent solution that just works very well.

Sounds like wayfarer :P I'll write you in discord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants