<a href="https://colab.research.google.com/github/candidlpd/pyspark-coding-interview/blob/master/How_to_check_for_Alphanumeric_values_Like_Wildcards.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install pyspark

Collecting pyspark
  Downloading pyspark-3.5.3.tar.gz (317.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m317.3/317.3 MB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
  Created wheel for pyspark: filename=pyspark-3.5.3-py2.py3-none-any.whl size=317840625 sha256=514bff0fb38c5afa22358f4c543ba3d447b876cd901e3f66587cf5bfab6059bf
  Stored in directory: /root/.cache/pip/wheels/1b/3a/92/28b93e2fbfdbb07509ca4d6f50c5e407f48dce4ddbda69a4ab
Successfully built pyspark
Installing collected packages: pyspark
Successfully installed pyspark-3.5.3


In [2]:
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.types import DateType

# Initialize Spark session
spark = SparkSession.builder.master("local").appName("test").getOrCreate()

In [3]:
# Create a DataFrame with alphanumeric values
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType

# Initialize the Spark session
spark = SparkSession.builder.appName("AlphanumericExample").getOrCreate()

# Define the schema
schema = StructType([
    StructField("ID", StringType(), True),
    StructField("Description", StringType(), True)
])

# Sample data with alphanumeric values
data = [
    ("1", "Product123"),
    ("2", "Service#456"),
    ("3", "Item!789"),
    ("4", "AlphaTest999"),
    ("5", "Beta_Test_ABC"),
    ("6", "Gamma-321"),
    ("7", "NonAlphanumeric"),
    ("8", "2024_Product"),
    ("9", "Mixed_Values#123")
]

# Create the DataFrame
df = spark.createDataFrame(data, schema=schema)

# Register the DataFrame as a temporary SQL view
df.createOrReplaceTempView("AlphanumericTable")

# Show the DataFrame
df.show()


+---+----------------+
| ID|     Description|
+---+----------------+
|  1|      Product123|
|  2|     Service#456|
|  3|        Item!789|
|  4|    AlphaTest999|
|  5|   Beta_Test_ABC|
|  6|       Gamma-321|
|  7| NonAlphanumeric|
|  8|    2024_Product|
|  9|Mixed_Values#123|
+---+----------------+



 Query to Find Rows with Alphanumeric Values Using LIKE

In [5]:
spark.sql(""" select * from AlphanumericTable """).show()

+---+----------------+
| ID|     Description|
+---+----------------+
|  1|      Product123|
|  2|     Service#456|
|  3|        Item!789|
|  4|    AlphaTest999|
|  5|   Beta_Test_ABC|
|  6|       Gamma-321|
|  7| NonAlphanumeric|
|  8|    2024_Product|
|  9|Mixed_Values#123|
+---+----------------+



In [4]:
spark.sql("""
select * FROM AlphanumericTable WHERE Description LIKE '%[a-zA-Z]%' and Description LIKE '%[0-9]%'



""").show()

+---+-----------+
| ID|Description|
+---+-----------+
+---+-----------+



In [8]:
spark.sql("""
select * FROM AlphanumericTable WHERE Description LIKE 'Product%'



""").show()

+---+-----------+
| ID|Description|
+---+-----------+
|  1| Product123|
+---+-----------+



In [9]:
# Query to find descriptions with special characters
spark.sql("""
SELECT *
FROM AlphanumericTable
WHERE Description LIKE '%#%' OR Description LIKE '%!%' OR Description LIKE '%_%'
""").show()


+---+----------------+
| ID|     Description|
+---+----------------+
|  1|      Product123|
|  2|     Service#456|
|  3|        Item!789|
|  4|    AlphaTest999|
|  5|   Beta_Test_ABC|
|  6|       Gamma-321|
|  7| NonAlphanumeric|
|  8|    2024_Product|
|  9|Mixed_Values#123|
+---+----------------+



**Using PySpark Functions to Check Alphanumeric Values**

In [10]:
from pyspark.sql.functions import col

# Check for both letters and numbers using rlike (regular expression)
df_with_alphanumeric = df.filter(
    col("Description").rlike(".*[A-Za-z].*") & col("Description").rlike(".*[0-9].*")
)

# Show the results
df_with_alphanumeric.show()


+---+----------------+
| ID|     Description|
+---+----------------+
|  1|      Product123|
|  2|     Service#456|
|  3|        Item!789|
|  4|    AlphaTest999|
|  6|       Gamma-321|
|  8|    2024_Product|
|  9|Mixed_Values#123|
+---+----------------+



In [11]:
# Filter rows where the description starts with "Product"
df_with_product = df.filter(
    col("Description").rlike("^Product.*")
)

# Show the results
df_with_product.show()


+---+-----------+
| ID|Description|
+---+-----------+
|  1| Product123|
+---+-----------+



In [12]:
# Check for rows with special characters like #, !, or _
df_with_special_chars = df.filter(
    col("Description").rlike(".*[#_!].*")
)

# Show the results
df_with_special_chars.show()


+---+----------------+
| ID|     Description|
+---+----------------+
|  2|     Service#456|
|  3|        Item!789|
|  5|   Beta_Test_ABC|
|  8|    2024_Product|
|  9|Mixed_Values#123|
+---+----------------+

