We need to find bumblebees with names ending in "ern," "ed," or a charming hyphen. Our Queen Bee, a Southern belle with a flair for magnolias, believes these names hint at the finest nectar. Can you be on the lookout for maybe a Buzz-ern, Dappled-ed, or Polka-dotted bee to keep her hive the envy of the meadows!

In [None]:
/*locate certain bee populations by name pattern*/
/*regex -specificity, precision & density*/

proc print data=dst3;
/*ed OR ern FOLLOWED BY a SPACE or -*/
/*OR*/
/* any value with a dash*/
where prxmatch('/((ed|ern)\W)|\-/',commonname);
run;

/* Can the contains operator perform better? */
proc print data=dst3;
where commonname contains 'ed' or commonname contains 'en' or commonname contains '-';
run;

/* Certainly, the Like operator must perform better */
proc print data=dst3;
where commonname like '%ed%' and  commonname and '%ern%' and commonname contains '-';
run;

Regular Expression Breakdown- (ed|ern): This part of the regex uses a grouping to match either "ed" or "ern". The vertical bar | acts as an "OR" operator, meaning it will match if either "ed" or "ern" is found.

\W: Matches any non-word character. A "word" character is typically any letter, digit, or underscore (\w), so \W matches anything that is not a letter, digit, or underscore. This includes spaces, punctuation, etc.

(ed|ern)\W: This combination means that "ed" or "ern" must be followed by a non-word character. This ensures that the regex matches "ed" or "ern" only if they are at the end of a word or followed by something that is not part of the word (e.g., a space or punctuation).

-: Matches the literal hyphen character (-). The backslash () is used to escape the hyphen, ensuring that the regex treats it as a literal character rather than a special character with a different meaning in regex(e.g. a range of characters).

|: The "OR" operator allows the regex to match either the pattern on the left side or the pattern on the right side of it.

((ed|ern)\W)|-:

This entire expression will match: Any occurrence of "ed" or "ern" followed by a non-word character (like a space or punctuation). Any hyphen (-) in the string. https://regex101.com/

Buzzing Around: Mapping Bumblebee Hotspots! Let's track down where these fuzzy friends are hanging out the most. From hot & arid Arizona to the cool climes of Ontario, grab your data nets and let’s discover the ultimate bee hangouts

In [None]:
Title "count of bees by scientific name and stateprovince";
proc sql;
select  scientificname, stateprovince,  count(scientificname) as count 'Number of Bees'
from dst1
group by 2,  1
order by 3 desc,2, 1
;